Summary of the invention
Fundamental purpose of the present invention is disposal route and the device of the compressed file providing a kind of RAR form, to solve the problem that Hadoop of the prior art cannot read the compressed file analyzing RAR form.
To achieve these goals, according to an aspect of the present invention, a kind of disposal route of compressed file of RAR form is provided.
The disposal route of the compressed file of this RAR form comprises: the compressed file determining pending RAR form; Obtain the files loading class function and file decompress(ion) class function that are pre-created; Carrying out decompress(ion) by calling the compressed file of file decompress(ion) class function to pending RAR form in files loading class function, obtaining decompressing files; The store path of file after acquisition decompress(ion); Perform the analyzing and processing to decompressing files by data analysis function, obtain result.
Further, carry out decompress(ion) comprise by calling the compressed file of file decompress(ion) class function to pending RAR form in files loading class function, obtain decompressing files and comprise: perform in files loading class function and call file decompress(ion) class function; In file decompress(ion) class function, perform the decompression function called in decompress(ion) bag, obtain decompressing files, wherein, in decompress(ion) bag, store the decompression function compressed file of pending RAR form being carried out to decompress(ion).
Further, perform analyzing and processing to decompressing files, obtain result and comprise by data analysis function: the rreturn value obtaining file decompress(ion) class function, wherein, the rreturn value of file decompress(ion) class function is the character string that the store path of decompressing files is corresponding; The rreturn value of file decompress(ion) class function is sent to data analysis function and carries out analyzing and processing, obtain result.
Further, the rreturn value of file decompress(ion) class function is sent to data analysis function and carries out analyzing and processing, obtain result and comprise: the rreturn value of file decompress(ion) class function is converted into class of paths address; The decompressing files that acquisition approach class address place stores; Data analysis function carries out analyzing and processing to decompressing files, obtains result.
Further, carry out analyzing and processing at data analysis function to decompressing files, after obtaining result, method also comprises: delete the decompressing files that class of paths address place stores; Result is stored in address corresponding to default store path.
Further, the disposal route of the compressed file of this RAR form starts multiple process simultaneously, wherein, in each process, data analysis function is to decompressing files execution analysis process, at data analysis function, analyzing and processing is carried out to decompressing files, after obtaining result, the disposal route of the compressed file of this RAR form also comprises: the multiple results obtained after the data analysis Functional Analysis process in multiple process merged, and obtains the result after merging; Export the result after merging.
To achieve these goals, according to a further aspect in the invention, a kind for the treatment of apparatus of compressed file of RAR form is provided.
The treating apparatus of the compressed file of this RAR form comprises: determination module, for determining the compressed file of pending RAR form; First acquisition module, for obtaining the files loading class function and file decompress(ion) class function that are pre-created; Decompression module, for carrying out decompress(ion) by calling the compressed file of file decompress(ion) class function to pending RAR form in files loading class function, obtains decompressing files; Second acquisition module, for obtaining the store path of decompressing files; Processing module, for being performed the analyzing and processing to decompressing files by data analysis function, obtains result.
Further, decompression module comprises: the first calling module, calls file decompress(ion) class function for performing in files loading class function; Second calling module, for performing the decompression function called in decompress(ion) bag in file decompress(ion) class function, obtaining decompressing files, wherein, storing the decompression function compressed file of pending RAR form being carried out to decompress(ion) in decompress(ion) bag.
Further, processing module comprises: second obtains submodule, and for obtaining the rreturn value of file decompress(ion) class function, wherein, the rreturn value of file decompress(ion) class function is the character string that the store path of decompressing files is corresponding; First process submodule, carry out analyzing and processing for the rreturn value of file decompress(ion) class function is sent to data analysis function, obtain result, wherein, first process submodule comprises: conversion module, for the rreturn value of file decompress(ion) class function is converted into class of paths address; 3rd obtains submodule, for the decompressing files that acquisition approach class address place stores; Second process submodule, carries out analyzing and processing for data analysis function to decompressing files, obtains result.
Further, the treating apparatus of the compressed file of this RAR form also comprises: removing module, for deleting the decompressing files that class of paths address place stores; Memory module, for being stored in address corresponding to default store path by result.
By the present invention, adopt the compressed file determining pending RAR form; Obtain the files loading class function and file decompress(ion) class function that are pre-created; Carrying out decompress(ion) by calling the compressed file of file decompress(ion) class function to pending RAR form in files loading class function, obtaining decompressing files; Obtain the store path of decompressing files; Perform the analyzing and processing to decompressing files by data analysis function, obtain result, solve the problem that Hadoop of the prior art cannot read the compressed file analyzing RAR form.This invention creates files loading class function and file decompress(ion) class function on the basis inputting segmentation function and function reading, file decompress(ion) class function is utilized to read the compressed file of the RAR form in HDFS and carried out decompress(ion), obtain decompressing files, then decompressing files is sent in Map and carries out analyzing and processing, obtain result, finally result is stored in HDFS.In this invention, Hadoop can start multiple Map task simultaneously, the corresponding file decompress(ion) class function of each Map, each file decompress(ion) class function reads the compressed file of a RAR form, processes, improve execution efficiency while achieving the compressed file of multiple pending RAR form like this.In addition, this invention executes after analyzing and processing obtains result at Map and is deleted by temporary decompressing files, has saved system space.
Embodiment
It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.Below with reference to the accompanying drawings and describe the present invention in detail in conjunction with the embodiments.
The application's scheme is understood better in order to make those skilled in the art person, below in conjunction with the accompanying drawing in the embodiment of the present application, technical scheme in the embodiment of the present application is clearly and completely described, obviously, described embodiment is only the embodiment of the application's part, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all should belong to the scope of the application's protection.
It should be noted that, term " first ", " second " etc. in the instructions of the application and claims and above-mentioned accompanying drawing are for distinguishing similar object, and need not be used for describing specific order or precedence.Should be appreciated that the data used like this can be exchanged, in the appropriate case so that the embodiment of the application described herein.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, such as, contain those steps or unit that the process of series of steps or unit, method, system, product or equipment is not necessarily limited to clearly list, but can comprise clearly do not list or for intrinsic other step of these processes, method, product or equipment or unit.
The present invention aims to provide a kind of disposal route and device of compressed file of RAR form.
Fig. 2 is the process flow diagram of the disposal route of the compressed file of RAR form according to the embodiment of the present invention.As shown in Figure 2, the disposal route of the compressed file of this RAR form comprises following step S101 to step S105:
Step S101, determines the compressed file of pending RAR form.
The compressed file of the RAR form stored in HDFS has a lot of usually; the disposal route of the compressed file of the RAR form of this embodiment can carry out reading process to the compressed file of a RAR form, also can carry out reading process to the compressed file of multiple RAR form.Determine that the number of the compressed file of pending RAR form can be determined according to concrete analysis demand.Preferably, the disposal route of the compressed file of the RAR form of this embodiment determines that the number of the compressed file of pending RAR form is multiple, process separately relative to after the compressed file reading RAR form one by one, the disposal route of the compressed file of the RAR form of this embodiment drastically increases treatment effeciency.
Preferably, also comprise while the compressed file determining pending RAR form: the store path obtaining the compressed file of pending RAR form, the object obtaining this store path is the compressed file of the pending RAR form of the address place storage that this store path of Obtaining Accurate is corresponding.
Step S102, obtains the files loading class function and file decompress(ion) class function that are pre-created.
The Treatment Analysis that the compressed file of pending RAR form is correlated with is needed by means of class function, such as, the compressed file reading pending RAR form needs to use files loading function, and carrying out decompress(ion) to the compressed file of pending RAR form needs to use file decompression function etc.The disposal route of the compressed file of the RAR form of this embodiment creates the files loading class function RarInputFormat of the compressed file for RAR form on the basis of inheriting the input segmentation function InputFormat for ordinary file in Hadoop framework, and the basis of inheriting the function reading RecordReader for ordinary file in Hadoop framework creates the file decompress(ion) class function RarRecordReader of the compressed file for RAR form.RarInputFormat can read the compressed file of one or more RAR form from HDFS, and the compressed file of this one or more RAR form is carried out Data Segmentation, generates several data file fragments.RarRecordReader reads this data file fragment, and these data file fragments are carried out decompress(ion), creating decompressed file, and obtain and store the store path of decompressing files, using the store path of this storage decompressing files as Parameter transfer to data analysis function Map.Reading to the compressed file of RAR form and decompress(ion) can be realized by these two class functions.
Step S103, carrying out decompress(ion) by calling the compressed file of file decompress(ion) class function to pending RAR form in files loading class function, obtaining decompressing files.
In files loading class function RarInputFormat, call file decompress(ion) class function RarRecordReader decompress(ion) is carried out to the compressed file of RAR form, obtain decompressing files.Wherein, in file decompress(ion) class function RarRecordReader, carry out decompress(ion) to the compressed file of RAR form specifically to comprise: perform in files loading class function RarInputFormat and call file decompress(ion) class function RarRecordReader; In file decompress(ion) class function RarRecordReader, perform the decompression function called in decompress(ion) bag, obtain decompressing files, wherein, in decompress(ion) bag, store the decompression function compressed file of pending RAR form being carried out to decompress(ion).Decompress(ion) bag in this embodiment is preferably java-unrar-0.5.jar bag, and this jar wraps can at https: //clojars.org/org.clojars.bonega/java-unrar downloads.The decompression function called in file decompress(ion) class function RarRecordReader in this jar bag carries out decompress(ion) to the compressed file of RAR form.The disposal route of the compressed file of the RAR form of this embodiment carries out decompress(ion) to the compressed file of RAR form, conveniently reads the content of this file or processes.
Step S104, the store path of file after acquisition decompress(ion).
By calling after the compressed file of file decompress(ion) class function RarRecordReader to pending RAR form carry out decompress(ion) in files loading class function RarInputFormat, obtain decompressing files.Preferably, decompressing files will temporarily be stored in HDFS.The store path of decompressing files in HDFS will be delivered in data analysis function Map as the rreturn value Value of file decompression function RarRecordReader.Using the store path of decompressing files in HDFS as parameters input in data analysis function Map, be conducive to decompressing files corresponding data analysis function Map obtains the compressed file decompress(ion) of RAR form according to this store path after, facilitate data analysis function Map to carry out analyzing and processing to this decompressing files.
Step S105, performs the analyzing and processing to decompressing files by data analysis function, obtains result.
After data analysis function Map receives the rreturn value Value of file decompression function RarRecordReader, this data analysis function can carry out analyzing and processing to decompressing files.Preferably, the disposal route of the compressed file of the RAR form of this embodiment is by the analyzing and processing of data analysis function execution to decompressing files, obtain result specifically can comprise: the rreturn value obtaining file decompress(ion) class function, wherein, the rreturn value of file decompress(ion) class function is the character string that the store path of decompressing files is corresponding; The rreturn value of file decompress(ion) class function is sent to data analysis function and carries out analyzing and processing, obtain result.
Particularly, the rreturn value of file decompress(ion) class function is sent to data analysis function and carries out analyzing and processing, obtaining result can comprise: by the class of paths function Path in Hadoop, the rreturn value of file decompress(ion) class function is converted into class of paths address; The decompressing files that the class of paths address place pointed to by data acquisition function F SDateInputStream acquisition approach class function Path is stored; Data analysis function completes the analyzing and processing to decompressing files by business diagnosis logic, and obtains result.
Preferably, at data analysis function, analyzing and processing is carried out to decompressing files, after obtaining result, the disposal route of the compressed file of the RAR form of this embodiment also comprises: the class of paths address place that deletion class of paths function Path points to temporarily is stored in the decompressing files in HDFS, release disk space; And by output function OutputFormat, result is exported, be stored in the address that default store path is corresponding.The decompressing files be temporarily stored in HDFS is deleted by the disposal route of the compressed file of the RAR form of this embodiment, is conducive to delivery system memory headroom.
Preferably, the disposal route of the compressed file of the RAR form of this embodiment can start multiple process simultaneously, wherein, in each process, data analysis function Map is to decompressing files execution analysis process, at data analysis function Map, analyzing and processing is carried out to decompressing files, after obtaining result, the disposal route of the compressed file of the RAR form of this embodiment can also comprise: the multiple results obtained after the data analysis function Map analyzing and processing in multiple process merged by pooled function Reduce, obtains the result after merging; The result after merging is exported finally by data function OutputFormat.The disposal route of the compressed file of the RAR form of this embodiment starts multiple Map task simultaneously, processes, improve execution efficiency while achieving the compressed file to multiple pending RAR form.Fig. 3 reads according to the Hadoop of the embodiment of the present invention compressed file process flow diagram analyzing RAR form.
The disposal route of the compressed file of the RAR form of this embodiment adopts the compressed file determining pending RAR form; Obtain the files loading class function and file decompress(ion) class function that are pre-created; Carrying out decompress(ion) by calling the compressed file of file decompress(ion) class function to pending RAR form in files loading class function, obtaining decompressing files; The store path of file after acquisition decompress(ion); Perform the analyzing and processing to decompressing files by data analysis function, obtain result, solve the problem that Hadoop of the prior art cannot read the compressed file analyzing RAR form.Meanwhile, the disposal route of the compressed file of the RAR form of this embodiment starts multiple Map task simultaneously, processes, improve execution efficiency while achieving the compressed file to multiple pending RAR form.And the disposal route of the compressed file of the RAR form of this embodiment executes after analyzing and processing obtains result at Map deletes temporary decompressing files, has saved system space.
From above description, can find out, the disposal route of the compressed file of the RAR form of the embodiment of the present invention creates files loading class function and file decompress(ion) class function on the basis inputting segmentation function and function reading, file decompress(ion) class function is utilized to read the compressed file of the RAR form in HDFS and carried out decompress(ion), obtain decompressing files, then decompressing files is sent in Map and carries out analyzing and processing, obtain result, finally result is stored in HDFS, solve the problem that Hadoop of the prior art cannot read the compressed file analyzing RAR form.Simultaneously, in this inventive embodiment, Hadoop can start multiple Map task simultaneously, the corresponding file decompress(ion) class function of each Map, each file decompress(ion) class function reads the compressed file of a RAR form, process while achieving the compressed file of multiple pending RAR form like this, improve execution efficiency.In addition, this inventive embodiment executes after analyzing and processing obtains result at Map and is deleted by temporary decompressing files, has greatly saved system space.
It should be noted that, can perform in the computer system of such as one group of computer executable instructions in the step shown in the process flow diagram of accompanying drawing, and, although show logical order in flow charts, but in some cases, can be different from the step shown or described by order execution herein.
The embodiment of the present invention additionally provides a kind for the treatment of apparatus of compressed file of RAR form.It should be noted that, the treating apparatus of the compressed file of this RAR form may be used for the disposal route of the compressed file of the RAR form performing the embodiment of the present invention.
Fig. 4 is the schematic diagram of the treating apparatus of the compressed file of RAR form according to the embodiment of the present invention.As shown in Figure 4, the treating apparatus of the compressed file of this RAR form comprises: determination module 10, the first acquisition module 20, decompression module 30, the second acquisition module 40 and processing module 50.
Determination module 10, for determining the compressed file of pending RAR form.
First acquisition module 20, for obtaining the files loading class function and file decompress(ion) class function that are pre-created.
Decompression module 30, for carrying out decompress(ion) by calling the compressed file of file decompress(ion) class function to pending RAR form in files loading class function, obtains decompressing files.
Preferably, decompression module 30 comprises: the first calling module, calls file decompress(ion) class function for performing in files loading class function; Second calling module, for performing the decompression function called in decompress(ion) bag in file decompress(ion) class function, obtaining decompressing files, wherein, storing the decompression function compressed file of pending RAR form being carried out to decompress(ion) in decompress(ion) bag.
Second acquisition module 40, for obtaining the store path of decompressing files.
Processing module 50, for being performed the analyzing and processing to decompressing files by data analysis function, obtains result.
Preferably, processing module 50 comprises: second obtains submodule, and for obtaining the rreturn value of file decompress(ion) class function, wherein, the rreturn value of file decompress(ion) class function is the character string that the store path of decompressing files is corresponding; First process submodule, carries out analyzing and processing for the rreturn value of file decompress(ion) class function is sent to data analysis function, obtains result.
Particularly, the first process submodule comprises: conversion module, for the rreturn value of file decompress(ion) class function is converted into class of paths address; 3rd obtains submodule, for the decompressing files that acquisition approach class address place stores; Second process submodule, carries out analyzing and processing for data analysis function to decompressing files, obtains result.
Preferably, the treating apparatus of the compressed file of the RAR form of this embodiment also comprises: removing module, for deleting the decompressing files that class of paths address place stores; Memory module, for being stored in address corresponding to default store path by result.
The treating apparatus of the compressed file of the RAR form of this embodiment comprises determination module 10, the first acquisition module 20, decompression module 30, the second acquisition module 40 and processing module 50.The problem that Hadoop of the prior art cannot read the compressed file analyzing RAR form is solved by the treating apparatus of the compressed file of the RAR form of this embodiment, simultaneously, by reading process while the compressed file to multiple pending RAR form, improve treatment effeciency, by deleting temporary decompressing files, save system space.
Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.