CN104579357B - The method and apparatus for handling compressed file - Google Patents
The method and apparatus for handling compressed file Download PDFInfo
- Publication number
- CN104579357B CN104579357B CN201510016383.6A CN201510016383A CN104579357B CN 104579357 B CN104579357 B CN 104579357B CN 201510016383 A CN201510016383 A CN 201510016383A CN 104579357 B CN104579357 B CN 104579357B
- Authority
- CN
- China
- Prior art keywords
- lzo
- plug
- file
- hadoop
- units
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of method and apparatus for handling compressed file, and Presto can be made to support LZO compressed formats.The method of the processing compressed file of the present invention includes:In Presto startup of server, imported Hadoop lzo plug-in units as third side plug;In the case where the Presto servers read the file of lzo compressed formats, the file of the lzo compressed formats is handled using the Hadoop lzo plug-in units.
Description
Technical field
The present invention relates to field of computer technology, a kind of particularly method and apparatus for handling compressed file.
Background technology
In big data field, data compression is a very important technology, and compression storage, energy are carried out to mass data
Server storage is enough saved, improves data-handling efficiency, memory is reduced and magnetic disc i/o expense, the SQL for improving big data is looked into
Ask efficiency.
LZO (Lempel-Ziv-Oberhumer) is directed to a kind of data compression algorithm of decompression speed, this algorithm
It is lossless compression, with reference to realizing that program is thread-safe, and with decompression is simple, speed is very fast, in decompression does not need to
It deposits, compresses the features such as considerably fast.
Since Presto and LZO follow different open source protocols, Presto follow Apache Licence 2.0 (one
The free software licensing of Apache Software Foundation publication), and LZO follows GPL (General Public License is
One free software permission agreement clause used extensively) agreement, therefore Presto can not integrate LZO source codes simultaneously in source code
Realize the support to LZO compressed formats.
Invention content
In view of this, the present invention provides a kind of method and apparatus for handling compressed file, and Presto can be made to support LZO pressures
Contracting form.
To achieve the above object, according to an aspect of the invention, there is provided a kind of method for handling compressed file.
The method of the processing compressed file of the present invention includes:In Presto startup of server, by Hadoop-lzo plug-in units
It is imported as third side plug;In the case where the Presto servers read the file of lzo compressed formats, using described
The file of the Hadoop-lzo plug-in units processing lzo compressed formats.
Optionally, it in the case where the Presto servers read the file of lzo compressed formats, further includes:Judge
The file of the lzo compressed formats whether there is index file, if so, according to the index file to the lzo compressed formats
File carry out fragment obtain multiple data slices;The file of the lzo compressed formats is handled using the Hadoop-lzo plug-in units
The step of include:Using the Hadoop-lzo plug-in units, parallel processing is carried out to the multiple data slice.
Optionally, in the data for using the Hadoop-lzo plug-in units processing lzo compressed formats, lzo decompressions are called
Function;Wherein the lzo decompression functions inherit general decompression function, and the interface provided using Hadoop-lzo plug-in units
It is written over.
According to another aspect of the present invention, a kind of device for handling compressed file is provided.
The device of the processing compressed file of the present invention includes:Plug-in unit import modul, in Presto startup of server,
It is imported Hadoop-lzo plug-in units as third side plug;Processing module, for reading lzo pressures in the Presto servers
In the case of the file of contracting form, the file of the lzo compressed formats is handled using the Hadoop-lzo plug-in units.
Optionally, the processing module is additionally operable to read the file of lzo compressed formats in the Presto servers
In the case of, the file of the lzo compressed formats is judged with the presence or absence of index file, if so, according to the index file to described
The file of lzo compressed formats carries out fragment and obtains multiple data slices;And the Hadoop-lzo plug-in units are used, to the multiple
Data slice carries out parallel processing.
Optionally, the processing module is additionally operable in the number for using the Hadoop-lzo plug-in units processing lzo compressed formats
According to when, call lzo decompression functions;Wherein the lzo decompression functions inherit general decompression function, and use
The interface that Hadoop-lzo plug-in units provide is written over.
According to the technique and scheme of the present invention, in Presto startup of server, using Hadoop-lzo plug-in units as third party
Plug-in unit imports, and utilizes the file of Hadoop-lzo plug-in units processing lzo compressed formats.Hadoop-lzo plug-in units, which provide, presses LZO
Contracting file carries out the interface of various processing, therefore the common interface provided by the plug-in unit can be realized to LZO compressed files
Processing, will not bring using open source protocol skimble-scamble puzzlement during LZO source codes.LZO can be supported by allowing for Presto in this way
Compressed format.In addition, carrying out fragment parallel processing to LZO files by handling LZO indexes, data processing can be further improved
Speed.If desired for other compressed formats of support, it is only necessary to add new plug-in unit, and general using the interface rewriting that the plug-in unit provides
Decompression function, system function is made to be easy to extend.
Description of the drawings
Attached drawing does not form inappropriate limitation of the present invention for more fully understanding the present invention.Wherein:
Fig. 1 be Presto servers according to embodiments of the present invention, card i/f, third side plug relationship signal
Figure;
Fig. 2 is the schematic diagram of the basic step of the method for processing compressed file according to embodiments of the present invention;
Fig. 3 is the schematic diagram of the main modular of the device of processing compressed file according to embodiments of the present invention.
Specific embodiment
It explains below in conjunction with attached drawing to the exemplary embodiment of the present invention, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
For clarity and conciseness, the description to known function and structure is omitted in sample in following description.
The support function to third side plug of Presto offers is utilized in the embodiment of the present invention, and basic principle is:By
In tripartite's plug-in unit to Presto server plug-ins set, in Presto startup of server, dynamic imports third side plug.
Presto dynamic lookup and binds third side plug when running, the interface that third side plug is called to provide, so as to which third party be inserted
The function of part is integrated into Presto.In embodiments of the present invention, Hadoop-lzo plug-in units are imported into Presto plug-in unit set
It is middle to be used as third side plug, the Presto dynamic bindings plug-in unit.Fig. 1 is Presto servers according to embodiments of the present invention, inserts
Part interface, third side plug relationship schematic diagram.
Fig. 2 is the schematic diagram of the basic step of the method for processing compressed file according to embodiments of the present invention.This method by
Presto servers perform.
Step S21:Presto startup of server.
Step S22:It is imported Hadoop-lzo plug-in units as third side plug.
Step S23:Data are read from data source.
Step S24:Whether the data for judging to read are the file of lzo compressed formats, if so, entering step S25, otherwise
Enter step S26.
Step S25:Use the file of Hadoop-lzo plug-in units processing lzo compressed formats.
Step S26:It is handled accordingly according to the form of the file read.
In above-mentioned flow, the file of lzo compressed formats read can also be judged with the presence or absence of index file, if
It is that then carrying out fragment to the file of the lzo compressed formats according to the index file obtains multiple data slices;It in this way should in processing
During the file of lzo compressed formats, parallel processings are carried out to above-mentioned multiple data slices using Hadoop-lzo plug-in units, it in this way can be into
One step improves the efficiency of processing data.
Handle lzo compressed formats file when, call LZO compression is handled with decompression function, LZO compression functions after
General decompression function, and the interface provided using Hadoop-lzo plug-in units are provided, weight is carried out to the general compression function
It writes.This mode causes system convenient for expanded function, when needing to support other compressed formats, it is only necessary to new plug-in unit is added,
And rewrite above-mentioned general decompression function using the interface that the plug-in unit provides.With decompression in above description
For, but it is equally applicable to the situation for needing to compress data.
Fig. 3 is the schematic diagram of the main modular of the device of processing compressed file according to embodiments of the present invention.Such as Fig. 3 institutes
Show, the device 30 of the processing compressed file of the embodiment of the present invention mainly includes plug-in unit import modul 31 and processing module 32.Plug-in unit
Import modul 31 is used in Presto startup of server, is imported Hadoop-lzo plug-in units as third side plug;Handle mould
Block 32 is used in the case where Presto servers read the file of lzo compressed formats, is handled using Hadoop-lzo plug-in units
The file of lzo compressed formats.
Processing module 32 can be additionally used in the case where Presto servers read the file of lzo compressed formats, judge
The file of the lzo compressed formats whether there is index file, if so, according to the index file to the text of the lzo compressed formats
Part carries out fragment and obtains multiple data slices;And above-mentioned Hadoop-lzo plug-in units are used, multiple data slice is located parallel
Reason.
Processing module 32 can also be used to, in the data for using Hadoop-lzo plug-in units processing lzo compressed formats, call lzo
Decompression function;Wherein the lzo decompression functions inherit general decompression function, and are provided using Hadoop-lzo plug-in units
Interface be written over.
Technical solution according to embodiments of the present invention, in Presto startup of server, using Hadoop-lzo plug-in units as
Third side plug imports, and utilizes the file of Hadoop-lzo plug-in units processing lzo compressed formats.Hadoop-lzo plug-in units provide pair
LZO compressed files carry out the interface of various processing, therefore the common interface provided by the plug-in unit can be realized and compress text to LZO
The processing of part will not be brought using open source protocol skimble-scamble puzzlement during LZO source codes.Allowing for Presto in this way can support
LZO compressed formats.In addition, carrying out fragment parallel processing to LZO files by handling LZO indexes, data can be further improved
Processing speed.If desired for other compressed formats of support, it is only necessary to add new plug-in unit, and be rewritten using the interface that the plug-in unit provides
General decompression function makes system function be easy to extend.
The basic principle of the present invention is described above in association with specific embodiment, in apparatus and method of the present invention, it is clear that
Each component or each step can be decomposed and/or be reconfigured.These decompose and/or reconfigure should be regarded as the present invention etc.
Efficacious prescriptions case.Also, the step of performing above-mentioned series of processes can perform in chronological order according to the sequence of explanation naturally, still
It does not need to centainly perform sequentially in time.Certain steps can perform parallel or independently of one another.
Above-mentioned specific embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright
It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (2)
- A kind of 1. method for handling compressed file, which is characterized in that including:In Presto startup of server, imported Hadoop-lzo plug-in units as third side plug;In the case where the Presto servers read the file of lzo compressed formats, the Hadoop-lzo plug-in units are used Handle the file of the lzo compressed formats;In the data for using the Hadoop-lzo plug-in units processing lzo compressed formats, lzo compressions or decompression function are called; The wherein lzo compresses or decompression function inherits general compression or decompression function, and carry using Hadoop-lzo plug-in units The interface of confession is written over;When needing to support other compressed formats, new plug-in unit, and the interface provided using the plug-in unit are added Rewrite the general compression or decompression function;In the case where the Presto servers read the file of lzo compressed formats, further include:Judge the lzo compressions The file of form whether there is index file, if so, being divided according to the index file the file of the lzo compressed formats Piece obtains multiple data slices;The step of handling the file of the lzo compressed formats using the Hadoop-lzo plug-in units includes:Use the Hadoop- Lzo plug-in units carry out parallel processing to the multiple data slice.
- 2. a kind of device for handling compressed file, which is characterized in that including:Plug-in unit import modul, in Presto startup of server, being imported Hadoop-lzo plug-in units as third side plug;Processing module, in the case of reading the file of lzo compressed formats in the Presto servers, using described The file of the Hadoop-lzo plug-in units processing lzo compressed formats;The processing module is additionally operable to using the Hadoop- When lzo plug-in units handle the data of lzo compressed formats, lzo compressions or decompression function are called;The wherein lzo compresses or decompression Function inherits general compression or decompression function, and is written over using the interface that Hadoop-lzo plug-in units provide;It is needing When supporting other compressed formats, add new plug-in unit, and using the interface that the plug-in unit provides rewrite the general compression or Decompression function;The processing module is additionally operable in the case where the Presto servers read the file of lzo compressed formats, is judged The file of the lzo compressed formats whether there is index file, if so, according to the index file to the lzo compressed formats File carry out fragment obtain multiple data slices;And the Hadoop-lzo plug-in units are used, the multiple data slice is carried out Parallel processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510016383.6A CN104579357B (en) | 2015-01-13 | 2015-01-13 | The method and apparatus for handling compressed file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510016383.6A CN104579357B (en) | 2015-01-13 | 2015-01-13 | The method and apparatus for handling compressed file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104579357A CN104579357A (en) | 2015-04-29 |
CN104579357B true CN104579357B (en) | 2018-06-22 |
Family
ID=53094688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510016383.6A Active CN104579357B (en) | 2015-01-13 | 2015-01-13 | The method and apparatus for handling compressed file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104579357B (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6826301B2 (en) * | 2002-10-07 | 2004-11-30 | Infocus Corporation | Data transmission system and method |
US8497788B1 (en) * | 2012-04-25 | 2013-07-30 | Pure Storage Inc. | Efficient techniques for aligned fixed-length compression |
CN102708187B (en) * | 2012-05-14 | 2014-04-30 | 成都信息工程学院 | Reverse index mixed compression and decompression method based on Hbase database |
CN102970158B (en) * | 2012-11-05 | 2017-02-08 | 广东睿江云计算股份有限公司 | Log storage and processing method and log server |
CN103366015B (en) * | 2013-07-31 | 2016-04-27 | 东南大学 | A kind of OLAP data based on Hadoop stores and querying method |
-
2015
- 2015-01-13 CN CN201510016383.6A patent/CN104579357B/en active Active
Non-Patent Citations (3)
Title |
---|
hadoop中使用lzo的压缩;tinyid;《http://blog.csdn.net/xiaolang85/article/details/8649756》;20130308;第1-2页 * |
lzo本地压缩与解压缩实例;喜啊;《http://blog.csdn.net/scorpiohjx2/article/details/18423529》;20140117;第1页 * |
另一种扩展并加速Hadoop计算能力的计算架构—Presto;tinyid;《http://blog.csdn.net/cnweike/article/details/39519059》;20140925;第4页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104579357A (en) | 2015-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105354314B (en) | Data migration method and device | |
CN107832406B (en) | Method, device, equipment and storage medium for removing duplicate entries of mass log data | |
US7937371B2 (en) | Ordering compression and deduplication of data | |
EP3049983B1 (en) | Adaptive and recursive filtering for sample submission | |
DE60107964D1 (en) | DEVICE FOR CODING AND DECODING STRUCTURED DOCUMENTS | |
CN106528896B (en) | A kind of database optimizing method and device | |
CN106547911B (en) | Access method and system for massive small files | |
CN106407442B (en) | A kind of mass text data processing method and device | |
US20220360628A1 (en) | Technologies for conversion of acquirer files for big data ingestion | |
KR101379855B1 (en) | Method and apparatus for data migration from hierarchical database of mainframe system to rehosting solution database of open system | |
CN115858488A (en) | Parallel migration method and device based on data governance and readable medium | |
CN104579357B (en) | The method and apparatus for handling compressed file | |
Wilke et al. | An experience report: porting the MG‐RAST rapid metagenomics analysis pipeline to the cloud | |
US8924431B2 (en) | Pluggable domain-specific typing systems and methods of use | |
CN104090748B (en) | Source code based on Makefile simplifies the method that device carries out simplifying source code | |
CN106599244B (en) | General original log cleaning device and method | |
CN105468936A (en) | Application reinforcement method and apparatus | |
CN100511212C (en) | Processing method and apparatus for electronic table file | |
CN115203674A (en) | Automatic login method, system, device and storage medium for application program | |
US10223393B1 (en) | Efficient processing of source code objects using probabilistic data structures | |
CN104301333A (en) | Non-blocking type handshake implementation method and system | |
EP4046052A1 (en) | Customizable delimited text compression framework | |
CN102929559B (en) | Method and system for providing file | |
US10168909B1 (en) | Compression hardware acceleration | |
US20160254824A1 (en) | Determining compression techniques to apply to documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |