Invention content
It is a primary object of the present invention to provide a kind for the treatment of method and apparatus of the compressed file of RAR forms, to solve
Hadoop of the prior art can not read the problem of compressed file of analysis RAR forms.
To achieve these goals, according to an aspect of the invention, there is provided a kind of place of the compressed file of RAR forms
Reason method.
The processing method of the compressed file of the RAR forms includes:Determine the compressed file of pending RAR forms;It obtains
The file loading class function and file decompression class function being pre-created;By file being called to decompress class in loading class function in file
The compressed file of the pending RAR forms of function pair is decompressed, and obtains decompressing files;Obtain the storage road of file after decompressing
Diameter;Analyzing and processing to decompressing files is performed by data analysis function, obtains handling result.
Further, by file being called to decompress class function to pending RAR forms in loading class function in file
Compressed file carries out decompression and includes, and obtains decompressing files and includes:It is performed in file loads class function and calls file decompression class letter
Number;The decompression function in calling solution briquetting is performed in file decompresses class function, obtains decompressing files, wherein, it solves and is deposited in briquetting
Contain the decompression function decompressed to the compressed file of pending RAR forms.
Further, is obtained by handling result and is included for the analyzing and processing of decompressing files by the execution of data analysis function:It obtains
The return value of file decompression class function is taken, wherein, the return value of file decompression class function is corresponded to for the store path of decompressing files
Character string;The return value of file decompression class function is sent to data analysis function to analyze and process, obtains handling result.
Further, the return value of file decompression class function is sent to data analysis function to analyze and process, is obtained
Handling result includes:The return value of file decompression class function is converted into class of paths address;It is stored at acquisition approach class address
Decompressing files;Data analysis function pair decompressing files is analyzed and processed, and obtains handling result.
Further, it is analyzed and processed in data analysis function pair decompressing files, after obtaining handling result, method is also
Including:Delete the decompressing files stored at class of paths address;Handling result is stored in the corresponding address of default store path.
Further, the processing method of the compressed file of the RAR forms starts multiple processes simultaneously, wherein, each process
Middle data analysis function pair decompressing files performs analyzing and processing, is analyzed and processed, obtained in data analysis function pair decompressing files
To after handling result, the processing method of the compressed file of the RAR forms further includes:By the data analysis function in multiple processes
The multiple handling results obtained after analyzing and processing merge, the handling result after being merged;Processing knot after output merging
Fruit.
To achieve these goals, according to another aspect of the present invention, a kind of place of the compressed file of RAR forms is provided
Manage device.
The processing unit of the compressed file of the RAR forms includes:Determining module, for determining pending RAR forms
Compressed file;First acquisition module, for obtaining the file being pre-created loading class function and file decompression class function;Solve pressing mold
Block, for by called in loading class function in file file decompress class function to the compressed files of pending RAR forms into
Row decompression, obtains decompressing files;Second acquisition module, for obtaining the store path of decompressing files;Processing module, for passing through
Data analysis function performs the analyzing and processing to decompressing files, obtains handling result.
Further, decompression module includes:First calling module calls file for being performed in file loading class function
Decompress class function;Second calling module for performing the decompression function in calling solution briquetting in file decompression class function, obtains
Decompressing files, wherein, it solves in briquetting and is stored with the decompression function decompressed to the compressed file of pending RAR forms.
Further, processing module includes:Second acquisition submodule, for obtaining the return value of file decompression class function,
Wherein, the return value of file decompression class function is the corresponding character string of store path of decompressing files;First processing submodule, is used
It is analyzed and processed in the return value of file decompression class function is sent to data analysis function, obtains handling result, wherein, the
One processing submodule includes:Conversion module, for the return value of file decompression class function to be converted into class of paths address;Third obtains
Submodule is taken, for the decompressing files stored at acquisition approach class address;Second processing submodule, for data analysis function pair
Decompressing files is analyzed and processed, and obtains handling result.
Further, the processing unit of the compressed file of the RAR forms further includes:Removing module, for deleting class of paths
The decompressing files stored at address;Memory module, for handling result to be stored in the corresponding address of default store path.
By the present invention, using the compressed file for determining pending RAR forms;Obtain the file loading classes being pre-created
Function and file decompression class function;By file being called to decompress class function to pending RAR lattice in loading class function in file
The compressed file of formula is decompressed, and obtains decompressing files;Obtain the store path of decompressing files;It is performed by data analysis function
To the analyzing and processing of decompressing files, handling result is obtained, analysis RAR lattice can not be read by solving Hadoop of the prior art
The problem of compressed file of formula.The invention creates file loading class function on the basis of input segmentation function and function reading
Class function is decompressed with file, the compressed file of the RAR forms in HDFS is read using file decompression class function and is solved
Pressure, obtains decompressing files, then decompressing files is sent in Map and is analyzed and processed, obtain handling result, finally will place
Reason result is stored in HDFS.Hadoop can start multiple Map tasks simultaneously in the invention, and each Map corresponds to a file
Class function is decompressed, each file decompression class function reads the compressed file of a RAR form, has been achieved multiple pending
It is handled while the compressed file of RAR forms, improves execution efficiency.It is obtained in addition, the invention has performed analyzing and processing in Map
Temporary decompressing files is deleted after handling result, has saved system space.
Specific embodiment
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the application can phase
Mutually combination.The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order to which those skilled in the art is made to more fully understand application scheme, below in conjunction in the embodiment of the present application
The technical solution in the embodiment of the present application is clearly and completely described in attached drawing, it is clear that described embodiment is only
The embodiment of the application part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people
Member's all other embodiments obtained without making creative work should all belong to the model of the application protection
It encloses.
It should be noted that term " first " in the description and claims of this application and above-mentioned attached drawing, "
Two " etc. be the object for distinguishing similar, and specific sequence or precedence are described without being used for.It should be appreciated that it uses in this way
Data can be interchanged in the appropriate case, so as to embodiments herein described herein.In addition, term " comprising " and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit
Process, method, system, product or equipment are not necessarily limited to those steps or unit clearly listed, but may include without clear
It is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
The present invention is intended to provide a kind for the treatment of method and apparatus of the compressed file of RAR forms.
Fig. 2 is the flow chart of the processing method of the compressed file of RAR forms according to embodiments of the present invention.Such as Fig. 2 institutes
Show, the processing method of the compressed file of the RAR forms includes steps S101 to step S105:
Step S101 determines the compressed file of pending RAR forms.
The compressed file of RAR forms stored in HDFS usually has many, the compression text of the RAR forms of the embodiment
The processing method of part can be read out processing to the compressed file of a RAR form, can also be to the pressure of multiple RAR forms
Contracting file is read out processing.Determine that the number of the compressed file of pending RAR forms can be according to specific analysis demand
It is determined.Preferably, the processing method of the compressed file of the RAR forms of the embodiment determines the pressure of pending RAR forms
The number of contracting file is multiple, relative to individually being handled after the compressed file for reading RAR forms one by one, the embodiment
The processing method of the compressed file of RAR forms greatly improves treatment effeciency.
Preferably, it is further included while the compressed file for determining pending RAR forms:Obtain pending RAR lattice
The store path of the compressed file of formula, the purpose for obtaining the store path are accurately to obtain to deposit at the corresponding address of the store path
The compressed file of the pending RAR forms of storage.
Step S102 obtains the file loading class function being pre-created and file decompression class function.
Relevant processing analysis is carried out to the compressed file of pending RAR forms to be needed by means of class function, for example, reading
The compressed file of pending RAR forms is taken to need to use file loading function, to the compressed files of pending RAR forms into
Row decompression needs to use file decompression function etc..The processing method of the compressed file of the RAR forms of the embodiment is being inherited
It is created on the basis of the input segmentation function InputFormat for ordinary file in Hadoop frames for RAR forms
Compressed file file loading class function RarInputFormat, the reading for ordinary file in Hadoop frames are inherited
The file created on the basis of function RecordReader for the compressed file of RAR forms is taken to decompress class function
RarRecordReader.RarInputFormat can read the compressed file of one or more RAR form from HDFS,
And the compressed file of one or more RAR form is subjected to data segmentation, generate several data file segments.
RarRecordReader reads this data file segment, and these data file segments are decompressed, generation decompression text
Part, and the store path of storage decompressing files is obtained, the store path of the storage decompressing files is passed into data as parameter
Analytic function Map.Reading and the decompression of the compressed file to RAR forms can be realized by the two class functions.
Step S103, by file being called to decompress class function to pending RAR forms in loading class function in file
Compressed file is decompressed, and obtains decompressing files.
File RarRecordReader pairs of class function of decompression is called in file loading class function RarInputFormat
The compressed file of RAR forms is decompressed, and obtains decompressing files.Wherein, in file decompression class function RarRecordReader
In to the compressed file of RAR forms carry out decompression specifically include:Tune is performed in file loading class function RarInputFormat
With file decompression class function RarRecordReader;It is performed in file decompression class function RarRecordReader and calls solution
Decompression function in briquetting, obtains decompressing files, wherein, solve the compressed file being stored in briquetting to pending RAR forms
The decompression function decompressed.Solution briquetting in the embodiment is preferably java-unrar-0.5.jar packets, which can be
https://clojars.org/org.clojars.bonega/java-unrar is downloaded.Class function is decompressed in file
The decompression function in the jar packets is called to decompress the compressed file of RAR forms in RarRecordReader.The embodiment
The processing methods of compressed file of RAR forms the compressed file of RAR forms is decompressed, the convenient content to this document into
Row is read or processing.
Step S104 obtains the store path of file after decompression.
By file being called to decompress class function in loading class function RarInputFormat in file
After RarRecordReader decompresses the compressed file of pending RAR forms, decompressing files is obtained.Preferably,
Decompressing files will be stored temporarily in HDFS.Store path of the decompressing files in HDFS will be used as file decompression function
The return value Value of RarRecordReader is transmitted in data analysis function Map.By storage of the decompressing files in HDFS
Path is input to as parameter in data analysis function Map, is conducive to data analysis function Map and is obtained according to the store path
Corresponding decompressing files after the compressed file decompression of RAR forms, facilitates data analysis function Map to carry out the decompressing files
Analyzing and processing.
Step S105 performs the analyzing and processing to decompressing files by data analysis function, obtains handling result.
It, should after the return value Value for receiving file decompression function RarRecordReader in data analysis function Map
Data analysis function can analyze and process decompressing files.Preferably, the processing of the compressed file of the RAR forms of the embodiment
Method performs the analyzing and processing to decompressing files by data analysis function, and obtaining handling result can specifically include:Obtain text
Part decompresses the return value of class function, wherein, the return value of file decompression class function is the corresponding word of store path of decompressing files
Symbol string;The return value of file decompression class function is sent to data analysis function to analyze and process, obtains handling result.
Specifically, the return value of file decompression class function is sent to data analysis function to analyze and process, is obtained everywhere
Reason result can include:The return value that file is decompressed to class function by the path class function Path in Hadoop is converted into path
Class address;At the class of paths address being directed toward by data acquisition function FSDateInputStream acquisition approach class functions Path
The decompressing files of storage;Data analysis function completes the analyzing and processing to decompressing files by business diagnosis logic, and obtains everywhere
Manage result.
Preferably, it is analyzed and processed in data analysis function pair decompressing files, after obtaining handling result, the embodiment
The processing methods of compressed file of RAR forms further include:It deletes temporary at the class of paths address that path class function Path is directed toward
The decompressing files being stored in HDFS, release disk space;And pass through output function OutputFormat and carry out handling result
Output, is stored in the corresponding address of default store path.The processing method of the compressed file of the RAR forms of the embodiment will be temporary
The decompressing files being stored in HDFS is deleted, and is conducive to discharge system memory space.
Preferably, the processing method of the compressed file of the RAR forms of the embodiment can start multiple processes simultaneously,
In, data analysis function Map performs analyzing and processing to decompressing files in each process, in data analysis function Map to decompression text
Part is analyzed and processed, and after obtaining handling result, the processing method of the compressed file of the RAR forms of the embodiment can also wrap
It includes:The multiple handling results obtained after data analysis function Map analyzing and processing in multiple processes are passed through into pooled function
Reduce is merged, the handling result after being merged;After merging finally by data function OutputFormat outputs
Handling result.The processing method of the compressed file of the RAR forms of the embodiment starts multiple Map tasks simultaneously, realizes to more
It is handled while the compressed file of a pending RAR forms, improves execution efficiency.Fig. 3 is according to embodiments of the present invention
Hadoop reads the compressed file flow chart of analysis RAR forms.
The processing method of the compressed file of the RAR forms of the embodiment is using the compression text for determining pending RAR forms
Part;Obtain the file loading class function being pre-created and file decompression class function;By calling text in loading class function in file
Part decompression class function decompresses the compressed file of pending RAR forms, obtains decompressing files;Obtain file after decompressing
Store path;Analyzing and processing to decompressing files is performed by data analysis function, handling result is obtained, solves the prior art
In Hadoop can not read analysis RAR forms compressed file the problem of.Meanwhile the compression text of the RAR forms of the embodiment
The processing method of part starts multiple Map tasks simultaneously, while realizing the compressed file to multiple pending RAR forms at
Reason, improves execution efficiency.Moreover, the processing method of the compressed file of the RAR forms of the embodiment has performed analysis in Map
Processing deletes temporary decompressing files after obtaining handling result, has saved system space.
It can be seen from the above description that the processing method of the compressed file of the RAR forms of the embodiment of the present invention is defeated
Enter to create on the basis of segmentation function and function reading file loading class function and file decompression class function, decompressed using file
Class function reads the compressed file of the RAR forms in HDFS and is decompressed, and decompressing files is obtained, then by decompressing files
It is sent in Map and is analyzed and processed, obtain handling result, finally handling result is stored in HDFS, solve existing skill
Hadoop in art can not read the problem of compressed file of analysis RAR forms.Meanwhile Hadoop can in the embodiment of the invention
To start multiple Map tasks simultaneously, each Map corresponds to a file decompression class function, and each file decompression class function reads one
The compressed file of a RAR forms, the compressed file that multiple pending RAR forms have been achieved while, are handled, and are improved and are held
Line efficiency.In addition, the embodiment of the invention has performed analyzing and processing in Map obtains the decompressing files that will be kept in after handling result
It deletes, has greatly saved system space.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions
It is performed in computer system, although also, show logical order in flow charts, it in some cases, can be with not
The sequence being same as herein performs shown or described step.
The embodiment of the present invention additionally provides a kind of processing unit of the compressed file of RAR forms.It it should be noted that should
The processing unit of the compressed file of RAR forms can be used for performing the processing of the compressed file of the RAR forms of the embodiment of the present invention
Method.
Fig. 4 is the schematic diagram of the processing unit of the compressed file of RAR forms according to embodiments of the present invention.Such as Fig. 4 institutes
Show, the processing unit of the compressed file of the RAR forms includes:Determining module 10, the first acquisition module 20, decompression module 30, the
Two acquisition modules 40 and processing module 50.
Determining module 10, for determining the compressed file of pending RAR forms.
First acquisition module 20, for obtaining the file being pre-created loading class function and file decompression class function.
Decompression module 30, for by file being called to decompress class function to pending RAR in loading class function in file
The compressed file of form is decompressed, and obtains decompressing files.
Preferably, decompression module 30 includes:First calling module calls file for being performed in file loading class function
Decompress class function;Second calling module for performing the decompression function in calling solution briquetting in file decompression class function, obtains
Decompressing files, wherein, it solves in briquetting and is stored with the decompression function decompressed to the compressed file of pending RAR forms.
Second acquisition module 40, for obtaining the store path of decompressing files.
Processing module 50 for passing through the execution of data analysis function to the analyzing and processing of decompressing files, obtains handling result.
Preferably, processing module 50 includes:Second acquisition submodule, for obtaining the return value of file decompression class function,
Wherein, the return value of file decompression class function is the corresponding character string of store path of decompressing files;First processing submodule, is used
It is analyzed and processed in the return value of file decompression class function is sent to data analysis function, obtains handling result.
Specifically, the first processing submodule includes:Conversion module, for the return value of file decompression class function to be converted into
Class of paths address;Third acquisition submodule, for the decompressing files stored at acquisition approach class address;Second processing submodule,
It is analyzed and processed for data analysis function pair decompressing files, obtains handling result.
Preferably, the processing unit of the compressed file of the RAR forms of the embodiment further includes:Removing module, for deleting
The decompressing files stored at class of paths address;Memory module, it is corresponding for handling result to be stored in default store path
Location.
The processing unit of the compressed file of the RAR forms of the embodiment includes determining module 10, the first acquisition module 20, solution
Die block 30, the second acquisition module 40 and processing module 50.Pass through the processing unit of the compressed file of the RAR forms of the embodiment
Solve the problems, such as the compressed file that Hadoop of the prior art can not read analysis RAR forms, meanwhile, by being treated to multiple
The reading process while compressed file of the RAR forms of processing, improves treatment effeciency, by deleting temporary decompressing files,
System space is saved.
Obviously, those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with general
Computing device realize that they can concentrate on single computing device or be distributed in multiple computing devices and be formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
In the storage device by computing device come perform either they are fabricated to respectively each integrated circuit modules or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific
Hardware and software combines.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, that is made any repaiies
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.