CN104579357A - Method and device for processing compressed file - Google Patents

Method and device for processing compressed file Download PDF

Info

Publication number
CN104579357A
CN104579357A CN201510016383.6A CN201510016383A CN104579357A CN 104579357 A CN104579357 A CN 104579357A CN 201510016383 A CN201510016383 A CN 201510016383A CN 104579357 A CN104579357 A CN 104579357A
Authority
CN
China
Prior art keywords
lzo
plug
file
unit
hadoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510016383.6A
Other languages
Chinese (zh)
Other versions
CN104579357B (en
Inventor
袁安峰
吕信
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510016383.6A priority Critical patent/CN104579357B/en
Publication of CN104579357A publication Critical patent/CN104579357A/en
Application granted granted Critical
Publication of CN104579357B publication Critical patent/CN104579357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and device for processing a compressed file. The method and device for processing the compressed file enable the Presto to support the LZO compressed format. The method for processing the compressed file comprises the steps that when a Presto server is started, a Hadoop-lzo plug is led in as a third-party plug-in, and the Hadoop-lzo plug-in is used for processing the file in the LZO compressed format under the condition that the Presto server reads the file in the LZO compressed format.

Description

The method and apparatus of process compressed file
Technical field
The present invention relates to field of computer technology, particularly a kind of method and apparatus processing compressed file.
Background technology
In large data fields, data compression is a very important technology, carries out compression and stores, can save server storage, improve data-handling efficiency, reduce internal memory and magnetic disc i/o expense, improve the SQL query efficiency of large data mass data.
LZO (Lempel-Ziv-Oberhumer) is a kind of data compression algorithm being devoted to decompress(ion) speed, this algorithm is lossless compression, with reference to realize program be thread-safe, and have that decompress(ion) is simple, speed quickly, decompress(ion) does not need internal memory, and compression is the feature such as fast considerably.
Because Presto and LZO follows different open source protocol, Presto follows ApacheLicence 2.0 (a free software licence issued at Apache Software Foundation), and LZO follows GPL (General Public License, be one extensively by the free software permission agreement clause used) agreement, therefore Presto cannot in source code integrated LZO source code realize support to LZO compressed format.
Summary of the invention
In view of this, the invention provides a kind of method and apparatus processing compressed file, Presto can be made to support LZO compressed format.
For achieving the above object, according to an aspect of the present invention, a kind of method processing compressed file is provided.
The method of process compressed file of the present invention comprises: when Presto startup of server, is imported by Hadoop-lzo plug-in unit as third party's plug-in unit; When described Presto server reads the file of lzo compressed format, use the file of lzo compressed format described in the process of described Hadoop-lzo plug-in unit.
Alternatively, when described Presto server reads the file of lzo compressed format, also comprise: judge whether the file of described lzo compressed format exists index file, if so, then carry out burst according to the file of this index file to described lzo compressed format and obtain multiple data slice; Use the step of the file of lzo compressed format described in the process of described Hadoop-lzo plug-in unit to comprise: to use described Hadoop-lzo plug-in unit, parallel processing is carried out to described multiple data slice.
Alternatively, when using the data of described Hadoop-lzo plug-in unit process lzo compressed format, lzo decompression function is called; Wherein this lzo decompression function inherits general decompression function, and the interface using Hadoop-lzo plug-in unit to provide rewrites.
According to a further aspect in the invention, a kind of device processing compressed file is provided.
The device of process compressed file of the present invention comprises: plug-in unit imports module, for when Presto startup of server, is imported by Hadoop-lzo plug-in unit as third party's plug-in unit; Processing module, for when described Presto server reads the file of lzo compressed format, uses the file of lzo compressed format described in the process of described Hadoop-lzo plug-in unit.
Alternatively, described processing module is also for when described Presto server reads the file of lzo compressed format, judge whether the file of described lzo compressed format exists index file, if so, then carry out burst according to the file of this index file to described lzo compressed format and obtain multiple data slice; And use described Hadoop-lzo plug-in unit, parallel processing is carried out to described multiple data slice.
Alternatively, described processing module also for when using the data of described Hadoop-lzo plug-in unit process lzo compressed format, calls lzo decompression function; Wherein this lzo decompression function inherits general decompression function, and the interface using Hadoop-lzo plug-in unit to provide rewrites.
According to technical scheme of the present invention, when Presto startup of server, Hadoop-lzo plug-in unit is imported as third party's plug-in unit, utilizes the file of Hadoop-lzo plug-in unit process lzo compressed format.Hadoop-lzo plug-in unit provides interface LZO compressed file being carried out to various process, and the common interface therefore provided by this plug-in unit can realize the process to LZO compressed file, also can not bring open source protocol skimble-scamble puzzlement when using LZO source code.So just make Presto can support LZO compressed format.In addition, by process LZO index, burst parallel processing is carried out to LZO file, data processing speed can be improved further.As other compressed format supported by needs, only need to add new plug-in unit, and the interface using this plug-in unit to provide rewrites general decompression function, make systemic-function be easy to expansion.
Accompanying drawing explanation
Accompanying drawing is used for understanding the present invention better, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the schematic diagram of relation of Presto server according to the embodiment of the present invention, card i/f, third party's plug-in unit;
Fig. 2 is the schematic diagram of the basic step of the method for process compressed file according to the embodiment of the present invention;
Fig. 3 is the schematic diagram of the main modular of the device of process compressed file according to the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, one exemplary embodiment of the present invention is explained, comprising the various details of the embodiment of the present invention to help understanding, they should be thought it is only exemplary.Therefore, those of ordinary skill in the art will be appreciated that, can make various change and amendment, and can not deviate from scope and spirit of the present invention to the embodiments described herein.Equally, for clarity and conciseness, the description to known function and structure is eliminated in following description.
The embodiment of the present invention make use of the support function to third party's plug-in unit that Presto provides, and ultimate principle is: be integrated into by third party's plug-in unit in the set of Presto server plug-ins, when Presto startup of server, dynamically imports third party's plug-in unit.Dynamic lookup when Presto runs also binds third party's plug-in unit, calls the interface that third party's plug-in unit provides, thus by the function i ntegration of third party's plug-in unit in Presto.In embodiments of the present invention, Hadoop-lzo plug-in unit is imported to as third party's plug-in unit in the set of Presto plug-in unit, this plug-in unit of Presto dynamic binding.Fig. 1 is the schematic diagram of relation of Presto server according to the embodiment of the present invention, card i/f, third party's plug-in unit.
Fig. 2 is the schematic diagram of the basic step of the method for process compressed file according to the embodiment of the present invention.The method is performed by Presto server.
Step S21:Presto startup of server.
Step S22: Hadoop-lzo plug-in unit is imported as third party's plug-in unit.
Step S23: read data from data source.
Step S24: judge that whether the data that read are the file of lzo compressed format, if so, enter step S25, otherwise enter step S26.
Step S25: the file using Hadoop-lzo plug-in unit process lzo compressed format.
Step S26: the form according to the file read handles accordingly.
In above-mentioned flow process, can also judge whether the file of the lzo compressed format read exists index file, if so, then according to this index file, burst be carried out to the file of this lzo compressed format and obtain multiple data slice; Like this when processing the file of this lzo compressed format, using Hadoop-lzo plug-in unit to carry out parallel processing to above-mentioned multiple data slice, the efficiency of process data can be improved so further.
When processing the file of lzo compressed format, call LZO compression and process with decompression function, LZO compression function inherits general decompression function, and uses the interface that Hadoop-lzo plug-in unit provides, and the compression function general to this rewrites.This mode makes system be convenient to expanded function, when other compressed formats supported by needs, only need to add new plug-in unit, and the interface using this plug-in unit to provide rewrites above-mentioned general decompression function.For decompression in above description, but be equally applicable to the situation that needs to compress data.
Fig. 3 is the schematic diagram of the main modular of the device of process compressed file according to the embodiment of the present invention.As shown in Figure 3, the device 30 of the process compressed file of the embodiment of the present invention mainly comprises plug-in unit importing module 31 and processing module 32.Plug-in unit imports module 31 for when Presto startup of server, is imported by Hadoop-lzo plug-in unit as third party's plug-in unit; Processing module 32, for when Presto server reads the file of lzo compressed format, uses the file of Hadoop-lzo plug-in unit process lzo compressed format.
Processing module 32 is also used in Presto server when reading the file of lzo compressed format, judge whether the file of this lzo compressed format exists index file, if so, then according to this index file, burst is carried out to the file of this lzo compressed format and obtain multiple data slice; And use above-mentioned Hadoop-lzo plug-in unit, parallel processing is carried out to the plurality of data slice.
When processing module 32 is also used in the data using Hadoop-lzo plug-in unit process lzo compressed format, call lzo decompression function; Wherein this lzo decompression function inherits general decompression function, and the interface using Hadoop-lzo plug-in unit to provide rewrites.
According to the technical scheme of the embodiment of the present invention, when Presto startup of server, Hadoop-lzo plug-in unit is imported as third party's plug-in unit, utilizes the file of Hadoop-lzo plug-in unit process lzo compressed format.Hadoop-lzo plug-in unit provides interface LZO compressed file being carried out to various process, and the common interface therefore provided by this plug-in unit can realize the process to LZO compressed file, also can not bring open source protocol skimble-scamble puzzlement when using LZO source code.So just make Presto can support LZO compressed format.In addition, by process LZO index, burst parallel processing is carried out to LZO file, data processing speed can be improved further.As other compressed format supported by needs, only need to add new plug-in unit, and the interface using this plug-in unit to provide rewrites general decompression function, make systemic-function be easy to expansion.
Below describe ultimate principle of the present invention in conjunction with specific embodiments, in apparatus and method of the present invention, obviously, each parts or each step can decompose and/or reconfigure.These decompose and/or reconfigure and should be considered as equivalents of the present invention.Further, the step performing above-mentioned series of processes can order naturally following the instructions perform in chronological order, but does not need necessarily to perform according to time sequencing.Some step can walk abreast or perform independently of one another.
Above-mentioned embodiment, does not form limiting the scope of the invention.It is to be understood that depend on designing requirement and other factors, various amendment, combination, sub-portfolio can be there is and substitute in those skilled in the art.Any amendment done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within scope.

Claims (6)

1. process a method for compressed file, it is characterized in that, comprising:
When Presto startup of server, Hadoop-lzo plug-in unit is imported as third party's plug-in unit;
When described Presto server reads the file of lzo compressed format, use the file of lzo compressed format described in the process of described Hadoop-lzo plug-in unit.
2. method according to claim 1, is characterized in that,
When described Presto server reads the file of lzo compressed format, also comprise: judge whether the file of described lzo compressed format exists index file, if so, then carry out burst according to the file of this index file to described lzo compressed format and obtain multiple data slice;
Use the step of the file of lzo compressed format described in the process of described Hadoop-lzo plug-in unit to comprise: to use described Hadoop-lzo plug-in unit, parallel processing is carried out to described multiple data slice.
3. method according to claim 1 and 2, is characterized in that, when using the data of described Hadoop-lzo plug-in unit process lzo compressed format, calls lzo decompression function; Wherein this lzo decompression function inherits general decompression function, and the interface using Hadoop-lzo plug-in unit to provide rewrites.
4. process a device for compressed file, it is characterized in that, comprising:
Plug-in unit imports module, for when Presto startup of server, is imported by Hadoop-lzo plug-in unit as third party's plug-in unit;
Processing module, for when described Presto server reads the file of lzo compressed format, uses the file of lzo compressed format described in the process of described Hadoop-lzo plug-in unit.
5. device according to claim 4, is characterized in that,
Described processing module is also for when described Presto server reads the file of lzo compressed format, judge whether the file of described lzo compressed format exists index file, if so, then carry out burst according to the file of this index file to described lzo compressed format and obtain multiple data slice; And use described Hadoop-lzo plug-in unit, parallel processing is carried out to described multiple data slice.
6. the device according to claim 4 or 5, is characterized in that, described processing module also for when using the data of described Hadoop-lzo plug-in unit process lzo compressed format, calls lzo decompression function; Wherein this lzo decompression function inherits general decompression function, and the interface using Hadoop-lzo plug-in unit to provide rewrites.
CN201510016383.6A 2015-01-13 2015-01-13 The method and apparatus for handling compressed file Active CN104579357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510016383.6A CN104579357B (en) 2015-01-13 2015-01-13 The method and apparatus for handling compressed file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510016383.6A CN104579357B (en) 2015-01-13 2015-01-13 The method and apparatus for handling compressed file

Publications (2)

Publication Number Publication Date
CN104579357A true CN104579357A (en) 2015-04-29
CN104579357B CN104579357B (en) 2018-06-22

Family

ID=53094688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510016383.6A Active CN104579357B (en) 2015-01-13 2015-01-13 The method and apparatus for handling compressed file

Country Status (1)

Country Link
CN (1) CN104579357B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040066968A1 (en) * 2002-10-07 2004-04-08 Infocus Corporation Data compression and decompression system and method
CN102708187A (en) * 2012-05-14 2012-10-03 成都信息工程学院 Reverse index mixed compression and decompression method based on Hbase database
CN102970158A (en) * 2012-11-05 2013-03-13 广东睿江科技有限公司 Log storage and processing method and log server
CN103366015A (en) * 2013-07-31 2013-10-23 东南大学 OLAP (on-line analytical processing) data storage and query method based on Hadoop
US20130307709A1 (en) * 2012-04-25 2013-11-21 Pure Storage, Inc. Efficient techniques for aligned fixed-length compression

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040066968A1 (en) * 2002-10-07 2004-04-08 Infocus Corporation Data compression and decompression system and method
US20130307709A1 (en) * 2012-04-25 2013-11-21 Pure Storage, Inc. Efficient techniques for aligned fixed-length compression
CN102708187A (en) * 2012-05-14 2012-10-03 成都信息工程学院 Reverse index mixed compression and decompression method based on Hbase database
CN102970158A (en) * 2012-11-05 2013-03-13 广东睿江科技有限公司 Log storage and processing method and log server
CN103366015A (en) * 2013-07-31 2013-10-23 东南大学 OLAP (on-line analytical processing) data storage and query method based on Hadoop

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TINYID: "hadoop中使用lzo的压缩", 《HTTP://BLOG.CSDN.NET/XIAOLANG85/ARTICLE/DETAILS/8649756》 *
TINYID: "另一种扩展并加速Hadoop计算能力的计算架构—Presto", 《HTTP://BLOG.CSDN.NET/CNWEIKE/ARTICLE/DETAILS/39519059》 *
喜啊: "lzo本地压缩与解压缩实例", 《HTTP://BLOG.CSDN.NET/SCORPIOHJX2/ARTICLE/DETAILS/18423529》 *

Also Published As

Publication number Publication date
CN104579357B (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN108536761B (en) Report data query method and server
CN102906751B (en) A kind of method of data storage, data query and device
CN105354314B (en) Data migration method and device
US8615499B2 (en) Estimating data reduction in storage systems
US20150178342A1 (en) User-defined loading of data onto a database
CN105260464B (en) The conversion method and device of data store organisation
CN111078701B (en) Data extraction method and device based on relational database
CN106528896B (en) A kind of database optimizing method and device
CN111858730A (en) Data importing and exporting device, method, equipment and medium of graph database
CN104778252A (en) Index storage method and index storage device
CN108073705B (en) Distributed mass data aggregation acquisition method
CN104346378B (en) A kind of method, apparatus and system for realizing complex data processing
CN104579357A (en) Method and device for processing compressed file
CN114329253B (en) Network operation data query method, device, equipment and storage medium
CN112328641B (en) Multi-dimensional data aggregation method and device and computer equipment
CN108241679B (en) Data processing method and device
US11101819B2 (en) Compression of semi-structured data
CN102799649B (en) Intelligent input prompting device and intelligent input prompting method
Ramsey LIDAR in PostgreSQL with PointCloud
US20160254824A1 (en) Determining compression techniques to apply to documents
CN108073584B (en) Data processing method and server
CN111161047A (en) Bank business data processing and inquiring method and device
CN112817930A (en) Data migration method and device
CN111241099A (en) Industrial big data storage method and device
CN103440325A (en) High-efficiency multi-concurrence and auto-adaptation database and operation method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant