CN104579357B

CN104579357B - The method and apparatus for handling compressed file

Info

Publication number: CN104579357B
Application number: CN201510016383.6A
Authority: CN
Inventors: 袁安峰; 吕信
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2015-01-13
Filing date: 2015-01-13
Publication date: 2018-06-22
Anticipated expiration: 2035-01-13
Also published as: CN104579357A

Abstract

The present invention provides a kind of method and apparatus for handling compressed file, and Presto can be made to support LZO compressed formats.The method of the processing compressed file of the present invention includes：In Presto startup of server, imported Hadoop lzo plug-in units as third side plug；In the case where the Presto servers read the file of lzo compressed formats, the file of the lzo compressed formats is handled using the Hadoop lzo plug-in units.

Description

The method and apparatus for handling compressed file

Technical field

The present invention relates to field of computer technology, a kind of particularly method and apparatus for handling compressed file.

Background technology

In big data field, data compression is a very important technology, and compression storage, energy are carried out to mass data Server storage is enough saved, improves data-handling efficiency, memory is reduced and magnetic disc i/o expense, the SQL for improving big data is looked into Ask efficiency.

LZO (Lempel-Ziv-Oberhumer) is directed to a kind of data compression algorithm of decompression speed, this algorithm It is lossless compression, with reference to realizing that program is thread-safe, and with decompression is simple, speed is very fast, in decompression does not need to It deposits, compresses the features such as considerably fast.

Since Presto and LZO follow different open source protocols, Presto follow Apache Licence 2.0 (one The free software licensing of Apache Software Foundation publication), and LZO follows GPL (General Public License is One free software permission agreement clause used extensively) agreement, therefore Presto can not integrate LZO source codes simultaneously in source code Realize the support to LZO compressed formats.

Invention content

In view of this, the present invention provides a kind of method and apparatus for handling compressed file, and Presto can be made to support LZO pressures Contracting form.

To achieve the above object, according to an aspect of the invention, there is provided a kind of method for handling compressed file.

The method of the processing compressed file of the present invention includes：In Presto startup of server, by Hadoop-lzo plug-in units It is imported as third side plug；In the case where the Presto servers read the file of lzo compressed formats, using described The file of the Hadoop-lzo plug-in units processing lzo compressed formats.

Optionally, it in the case where the Presto servers read the file of lzo compressed formats, further includes：Judge The file of the lzo compressed formats whether there is index file, if so, according to the index file to the lzo compressed formats File carry out fragment obtain multiple data slices；The file of the lzo compressed formats is handled using the Hadoop-lzo plug-in units The step of include：Using the Hadoop-lzo plug-in units, parallel processing is carried out to the multiple data slice.

Optionally, in the data for using the Hadoop-lzo plug-in units processing lzo compressed formats, lzo decompressions are called Function；Wherein the lzo decompression functions inherit general decompression function, and the interface provided using Hadoop-lzo plug-in units It is written over.

According to another aspect of the present invention, a kind of device for handling compressed file is provided.

The device of the processing compressed file of the present invention includes：Plug-in unit import modul, in Presto startup of server, It is imported Hadoop-lzo plug-in units as third side plug；Processing module, for reading lzo pressures in the Presto servers In the case of the file of contracting form, the file of the lzo compressed formats is handled using the Hadoop-lzo plug-in units.

Optionally, the processing module is additionally operable to read the file of lzo compressed formats in the Presto servers In the case of, the file of the lzo compressed formats is judged with the presence or absence of index file, if so, according to the index file to described The file of lzo compressed formats carries out fragment and obtains multiple data slices；And the Hadoop-lzo plug-in units are used, to the multiple Data slice carries out parallel processing.

Optionally, the processing module is additionally operable in the number for using the Hadoop-lzo plug-in units processing lzo compressed formats According to when, call lzo decompression functions；Wherein the lzo decompression functions inherit general decompression function, and use The interface that Hadoop-lzo plug-in units provide is written over.

According to the technique and scheme of the present invention, in Presto startup of server, using Hadoop-lzo plug-in units as third party Plug-in unit imports, and utilizes the file of Hadoop-lzo plug-in units processing lzo compressed formats.Hadoop-lzo plug-in units, which provide, presses LZO Contracting file carries out the interface of various processing, therefore the common interface provided by the plug-in unit can be realized to LZO compressed files Processing, will not bring using open source protocol skimble-scamble puzzlement during LZO source codes.LZO can be supported by allowing for Presto in this way Compressed format.In addition, carrying out fragment parallel processing to LZO files by handling LZO indexes, data processing can be further improved Speed.If desired for other compressed formats of support, it is only necessary to add new plug-in unit, and general using the interface rewriting that the plug-in unit provides Decompression function, system function is made to be easy to extend.

Description of the drawings

Attached drawing does not form inappropriate limitation of the present invention for more fully understanding the present invention.Wherein：

Fig. 1 be Presto servers according to embodiments of the present invention, card i/f, third side plug relationship signal Figure；

Fig. 2 is the schematic diagram of the basic step of the method for processing compressed file according to embodiments of the present invention；

Fig. 3 is the schematic diagram of the main modular of the device of processing compressed file according to embodiments of the present invention.

Specific embodiment

It explains below in conjunction with attached drawing to the exemplary embodiment of the present invention, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together For clarity and conciseness, the description to known function and structure is omitted in sample in following description.

The support function to third side plug of Presto offers is utilized in the embodiment of the present invention, and basic principle is：By In tripartite's plug-in unit to Presto server plug-ins set, in Presto startup of server, dynamic imports third side plug. Presto dynamic lookup and binds third side plug when running, the interface that third side plug is called to provide, so as to which third party be inserted The function of part is integrated into Presto.In embodiments of the present invention, Hadoop-lzo plug-in units are imported into Presto plug-in unit set It is middle to be used as third side plug, the Presto dynamic bindings plug-in unit.Fig. 1 is Presto servers according to embodiments of the present invention, inserts Part interface, third side plug relationship schematic diagram.

Fig. 2 is the schematic diagram of the basic step of the method for processing compressed file according to embodiments of the present invention.This method by Presto servers perform.

Step S21：Presto startup of server.

Step S22：It is imported Hadoop-lzo plug-in units as third side plug.

Step S23：Data are read from data source.

Step S24：Whether the data for judging to read are the file of lzo compressed formats, if so, entering step S25, otherwise Enter step S26.

Step S25：Use the file of Hadoop-lzo plug-in units processing lzo compressed formats.

Step S26：It is handled accordingly according to the form of the file read.

In above-mentioned flow, the file of lzo compressed formats read can also be judged with the presence or absence of index file, if It is that then carrying out fragment to the file of the lzo compressed formats according to the index file obtains multiple data slices；It in this way should in processing During the file of lzo compressed formats, parallel processings are carried out to above-mentioned multiple data slices using Hadoop-lzo plug-in units, it in this way can be into One step improves the efficiency of processing data.

Handle lzo compressed formats file when, call LZO compression is handled with decompression function, LZO compression functions after General decompression function, and the interface provided using Hadoop-lzo plug-in units are provided, weight is carried out to the general compression function It writes.This mode causes system convenient for expanded function, when needing to support other compressed formats, it is only necessary to new plug-in unit is added, And rewrite above-mentioned general decompression function using the interface that the plug-in unit provides.With decompression in above description For, but it is equally applicable to the situation for needing to compress data.

Fig. 3 is the schematic diagram of the main modular of the device of processing compressed file according to embodiments of the present invention.Such as Fig. 3 institutes Show, the device 30 of the processing compressed file of the embodiment of the present invention mainly includes plug-in unit import modul 31 and processing module 32.Plug-in unit Import modul 31 is used in Presto startup of server, is imported Hadoop-lzo plug-in units as third side plug；Handle mould Block 32 is used in the case where Presto servers read the file of lzo compressed formats, is handled using Hadoop-lzo plug-in units The file of lzo compressed formats.

Processing module 32 can be additionally used in the case where Presto servers read the file of lzo compressed formats, judge The file of the lzo compressed formats whether there is index file, if so, according to the index file to the text of the lzo compressed formats Part carries out fragment and obtains multiple data slices；And above-mentioned Hadoop-lzo plug-in units are used, multiple data slice is located parallel Reason.

Processing module 32 can also be used to, in the data for using Hadoop-lzo plug-in units processing lzo compressed formats, call lzo Decompression function；Wherein the lzo decompression functions inherit general decompression function, and are provided using Hadoop-lzo plug-in units Interface be written over.

Technical solution according to embodiments of the present invention, in Presto startup of server, using Hadoop-lzo plug-in units as Third side plug imports, and utilizes the file of Hadoop-lzo plug-in units processing lzo compressed formats.Hadoop-lzo plug-in units provide pair LZO compressed files carry out the interface of various processing, therefore the common interface provided by the plug-in unit can be realized and compress text to LZO The processing of part will not be brought using open source protocol skimble-scamble puzzlement during LZO source codes.Allowing for Presto in this way can support LZO compressed formats.In addition, carrying out fragment parallel processing to LZO files by handling LZO indexes, data can be further improved Processing speed.If desired for other compressed formats of support, it is only necessary to add new plug-in unit, and be rewritten using the interface that the plug-in unit provides General decompression function makes system function be easy to extend.

The basic principle of the present invention is described above in association with specific embodiment, in apparatus and method of the present invention, it is clear that Each component or each step can be decomposed and/or be reconfigured.These decompose and/or reconfigure should be regarded as the present invention etc. Efficacious prescriptions case.Also, the step of performing above-mentioned series of processes can perform in chronological order according to the sequence of explanation naturally, still It does not need to centainly perform sequentially in time.Certain steps can perform parallel or independently of one another.

Above-mentioned specific embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims

A kind of 1. method for handling compressed file, which is characterized in that including：

In Presto startup of server, imported Hadoop-lzo plug-in units as third side plug；

In the case where the Presto servers read the file of lzo compressed formats, the Hadoop-lzo plug-in units are used Handle the file of the lzo compressed formats；

In the data for using the Hadoop-lzo plug-in units processing lzo compressed formats, lzo compressions or decompression function are called； The wherein lzo compresses or decompression function inherits general compression or decompression function, and carry using Hadoop-lzo plug-in units The interface of confession is written over；When needing to support other compressed formats, new plug-in unit, and the interface provided using the plug-in unit are added Rewrite the general compression or decompression function；

In the case where the Presto servers read the file of lzo compressed formats, further include：Judge the lzo compressions The file of form whether there is index file, if so, being divided according to the index file the file of the lzo compressed formats Piece obtains multiple data slices；

The step of handling the file of the lzo compressed formats using the Hadoop-lzo plug-in units includes：Use the Hadoop- Lzo plug-in units carry out parallel processing to the multiple data slice.
2. a kind of device for handling compressed file, which is characterized in that including：

Plug-in unit import modul, in Presto startup of server, being imported Hadoop-lzo plug-in units as third side plug；

Processing module, in the case of reading the file of lzo compressed formats in the Presto servers, using described The file of the Hadoop-lzo plug-in units processing lzo compressed formats；The processing module is additionally operable to using the Hadoop- When lzo plug-in units handle the data of lzo compressed formats, lzo compressions or decompression function are called；The wherein lzo compresses or decompression Function inherits general compression or decompression function, and is written over using the interface that Hadoop-lzo plug-in units provide；It is needing When supporting other compressed formats, add new plug-in unit, and using the interface that the plug-in unit provides rewrite the general compression or Decompression function；

The processing module is additionally operable in the case where the Presto servers read the file of lzo compressed formats, is judged The file of the lzo compressed formats whether there is index file, if so, according to the index file to the lzo compressed formats File carry out fragment obtain multiple data slices；And the Hadoop-lzo plug-in units are used, the multiple data slice is carried out Parallel processing.