CN110008192A

CN110008192A - A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing

Info

Publication number: CN110008192A
Application number: CN201910295185.6A
Authority: CN
Inventors: 姜洪正; 张猛; 孙昊
Original assignee: Suzhou Wave Intelligent Technology Co Ltd
Current assignee: Suzhou Wave Intelligent Technology Co Ltd
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2019-07-12

Abstract

The invention discloses a kind of data file compression methods, are related to data processing field, comprising: when receive data file compression request when, according to preset data classification rule determine to codec data belonging to classification, obtain data category；Mapping data according to the pre-stored data determine the corresponding Huffman tree of data category, obtain specified Huffman tree；Wherein, the mapping relations that each data category and corresponding Huffman tree under big data analysis are stored in data are mapped；Encoding and decoding are carried out to specified data according to specified Huffman tree.This method carries out the encoding and decoding of data of all categories by the Huffman tree of all categories that big data analysis obtains, and also has certain guarantee for the compression ratio of data while improving rate；The invention also discloses a kind of data file compression device, equipment and a kind of readable storage medium storing program for executing, have above-mentioned beneficial effect.

Description

A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing

Technical field

The present invention relates to data processing field, in particular to a kind of data file compression method, apparatus, equipment and one kind can Read storage medium.

Background technique

Huffman encoding is to be widely used for the highly effective coding method of data file compression.Its compression ratio usually exists Between 20%~90%, present Huffman is passed through since compression ratio is high, performance is excellent frequently as compression coding mode, for example, static Huffman compression and the compression of dynamic Huffman.Huffman compression & decompression requires Huffman tree, and Huffman tree is also known as most Excellent binary tree is a kind of shortest binary tree of cum rights path length.The cum rights path length of so-called tree is exactly leaf all in tree The weight of node is multiplied by its path length for arriving root node, and (if root node is 0 layer, the path length of leaf node to root node is leaf The number of plies of node).

Need to construct and safeguard Huffman encoding tree, Huffman in using Huffman encoding compression also decompression process The construction process needs of tree analyze the data in each data packet, the corresponding Huffman tree of a data packet, in structure Make complete carry out accordingly encode after also need carry out Huffman tree maintenance, until the data delete, using Huffman tree into The a large amount of resource of process generation can not be returned due to defeated construction process and in the compression of row codec data and decompression process It occupies, certain delay can be also caused to encoding-decoding process.

Therefore, the efficiency for how improving data file compression process is that those skilled in the art need the technology solved to ask Topic.

Summary of the invention

The object of the present invention is to provide a kind of data file compression method, this method is obtained all kinds of by big data analysis Other Huffman tree carries out the encoding and decoding of data of all categories, and compression ratio while improving rate for data also has certain guarantor Card；It is a further object of the present invention to provide a kind of data file compression device, equipment and a kind of readable storage medium storing program for executing.

In order to solve the above technical problems, the present invention provides a kind of data file compression method, it is based on Huffman encoding mode, Include:

When receiving data file compression request, determined according to preset data classification rule to belonging to codec data Classification, obtain data category；

Mapping data according to the pre-stored data determine the corresponding Huffman tree of the data category, obtain specified Huffman Tree；Wherein, the mapping relations of each data category and corresponding Huffman tree under big data analysis are stored in the mapping data；

Encoding and decoding are carried out to the specified data according to the specified Huffman tree.

Preferably, the mapping data according to the pre-stored data determine the corresponding Huffman tree of the data category, comprising:

The corresponding Huffman tree mark of the data category is obtained in the mapping data, obtains designated identification；

Extraction is identified to each Huffman tree in Huffman tree memory space, obtains the mark of each Huffman tree；

It is matched according to mark of the designated identification to each Huffman tree, the Huffman tree of successful match is made For the corresponding Huffman tree of the data category.

Preferably, the building method of the mapping data includes:

Big data analysis is carried out to current data, and carries out the construction of Huffman tree of all categories based on the analysis results, is obtained Default Huffman tree；

The default Huffman tree is set as the first priority；

When receiving Huffman tree upload request, the Huffman tree that user uploads is received；

The Huffman tree that the user uploads is set as the second priority；Wherein, second priority is higher than described First priority；

The corresponding Huffman tree mark of the data category is then obtained in the mapping data, and it is specific to obtain designated identification Are as follows: screen the Huffman tree of highest priority in data of all categories.

Preferably, the data file compression method further include:

When the amplitude of variation of data in first category is more than corresponding amplitude of variation threshold value, according to the first category Data situation carries out maintenance optimization to the Huffman tree of the first category, after obtaining the corresponding optimization of the first category Huffman tree.

Preferably, the data file compression method further include:

When the local system free time, by the compressed data in the first category according to the Huffman tree after the optimization Re-start compression.

The present invention discloses a kind of data file compression device, is based on Huffman encoding mode, comprising:

Data category determination unit, for being advised according to preset data classification when receiving data file compression request It then determines to classification belonging to codec data, obtains data category；

Huffman tree determination unit determines the corresponding Hough of the data category for mapping data according to the pre-stored data Man Shu obtains specified Huffman tree；Wherein, be stored in the mapping data under big data analysis each data category with it is corresponding The mapping relations of Huffman tree；

Codec unit, for carrying out encoding and decoding to the specified data according to the specified Huffman tree.

Preferably, the Huffman tree determination unit includes:

Mark determination unit to be obtained, for obtaining the corresponding Huffman tree of the data category in the mapping data Mark, obtains designated identification；

Marker extraction unit is set, for being identified extraction to each Huffman tree in Huffman tree memory space, is obtained each The mark of Huffman tree；

Matching unit is identified, for being matched according to mark of the designated identification to each Huffman tree, general With successful Huffman tree as the corresponding Huffman tree of the data category.

Preferably, the data file compression device further include: maintenance optimization unit, for when data in first category When amplitude of variation is more than corresponding amplitude of variation threshold value, according to the data situation of the first category to the first category Huffman tree carry out maintenance optimization, the Huffman tree after obtaining the corresponding optimization of the first category.

The present invention discloses a kind of apparatus for compressing data, comprising:

Memory, for storing program；

Processor, the step of data file compression method is realized when for executing described program.

The present invention discloses a kind of readable storage medium storing program for executing, and program is stored on the readable storage medium storing program for executing, and described program is located The step of reason device realizes the data file compression method when executing.

Data file compression method provided by the present invention goes the data according to storage according to the data of existing big data Data characteristics classification is carried out, extracts the overall data feature of different classes of data, for example can make according to industry, file format etc. Different industries data or the corresponding data characteristics of different File Format Datas are analyzed for class condition, utilize the structure of Huffman tree It makes method construct and goes out corresponding Huffman tree of all categories, data compression ratio can be guaranteed by carrying out construction according to overall data feature Magnitude；The corresponding Huffman tree according to category latent structure of one classification, the Huffman tree are suitable for all numbers under the category According to the i.e. encoding and decoding of category data can be carried out by the Huffman tree, so as to avoid the corresponding Kazakhstan of a data packet The high construction and maintenance expenditure that Fu Man tree generates in data encoding-decoding process, therefore can guarantee certain compression The rate of data file compression is effectively improved than in the case where.

The present invention also provides a kind of data file compression device, equipment and a kind of readable storage medium storing program for executing, have with above-mentioned Beneficial effect, details are not described herein.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is the flow chart of data file compression method provided in an embodiment of the present invention；

Fig. 2 is the structural block diagram of data file compression device provided in an embodiment of the present invention；

Fig. 3 is the structural schematic diagram of apparatus for compressing data provided in an embodiment of the present invention.

Specific embodiment

Core of the invention is to provide a kind of data file compression method, and this method can guarantee certain compression ratio In the case of effectively improve the rate of data file compression；Another core of the invention is to provide a kind of data file compression dress It sets, equipment and a kind of readable storage medium storing program for executing.

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Any answering when data file compression method provided by the invention is suitable for using Huffman tree progress data processing With scene, such as Huffman encoding, decoding, data compression, decompression etc., without limitation to concrete application scene at this.

Referring to FIG. 1, Fig. 1 is the flow chart of data file compression method provided in this embodiment；This method may include:

Step s110, it when receiving data file compression request, is determined according to preset data classification rule wait compile solution Classification belonging to code data, obtains data category.

When receiving data file compression request, codec data is treated according to predetermined data classification rule first Carry out the determination of classification.Wherein, data file compression request can refer to data compression or refer to data decompression, using Kazakhstan Any scene that Fu Man tree carries out data compression process is applicable.In addition, not done in the present invention to data category division foundation It limits, can be divided according to industry, for example be divided into medical data, IT data, finance data etc.；It can also be according to data Format is divided, such as video data, lteral data, image data, compression bag data etc.；Habit can also be used according to user It is used to be divided, for example be divided into user A data, user's B data, user's C data etc..

Due in the above process to data category division according to without limitation, data method of determination is related to the type of classification Connection, the corresponding data category method of determination of different classes of type is generally also different, therefore asks when receiving data file compression When asking, the classification method of determination of decoding data is treated without limitation.It, can such as when data are carried out category division according to industry To distinguish trade information according to keyword, or preset data can also be extracted by extracting the keyword in data The industry label of packet, or determined according to the file name to codec data, wherein since construction Huffman tree needs Binary form is converted data to, then key message extraction process need to be realized before Binary Conversion；When data are according to format , then can be by the suffix information of extraction document when carrying out category division, or it can also be by extracting format tags etc., herein The introduction of data category method of determination is only carried out by taking above situation as an example, other way can refer to the introduction of the present embodiment.

Big data analysis is carried out in the present invention, is carried out using the existing inherent law of different data parlor in the same category There is the data rule being closer between the same category data, go according to the data of existing big data in the extraction of Huffman tree Corresponding Huffman tree is extracted according to the data binary data of storage, can both guarantee data encoding-decoding efficiency, for one The data (for example including n data packet) of classification are used uniformly one or x (x is less than n) Huffman tree and carry out compiling solution accordingly Code, i.e., multiple general Huffman trees of data packet greatly reduce the construction and maintenance cost of Huffman tree.Wherein, according to The construction process that the feature of extraction carries out Huffman tree can refer to the constitution step of existing Huffman tree, and details are not described herein.

The construction process of Huffman tree of all categories is generally completed during carrying out big data analysis to data of all categories, It can be called directly in later data encoding-decoding process, the construction of data analysis and Huffman tree is carried out without scene, Reduce the time cost and resource occupation cost of this process, wherein according to data characteristics memory Huffman tree of all categories The specific steps of construction are referred to the construction process of existing Huffman tree, and the extraction process of Huffman tree may include following Step:

Character is ranked into vector P by weight size first；2. every time merge after point push_back to vector Q because Latter incorporated point is centainly less than the point first merged, so be also ordered into Q；3. so every time relatively P, the header element of Q can To extract two the smallest point, merge.

The live construction process of Huffman tree of all categories is carried out in carrying out data encoding-decoding process, contains n due to one The classification of a data packet need to only construct one or x (x is less than n) Huffman tree, need data packet and Huffman relative to current Tree corresponds, that is, the process for constructing n Huffman tree decreases n-1 or the resource of n-x Huffman tree construction process accounts for With.

In addition, data may constantly change in encoding-decoding process, such as data growth or reduction etc., construct in advance Corresponding Huffman tree of all categories can the updating maintenance at any time in data updating process so that Huffman tree is in optimal shape State guarantees encoding-decoding efficiency, in the present embodiment without limitation to the maintenance process of Huffman tree.

Step s120, mapping data according to the pre-stored data determine the corresponding Huffman tree of data category, obtain specified Kazakhstan Fu Man tree.

The mapping relations of each data category with corresponding Huffman tree are stored in mapping data, specific mapping data Form can be form or document form etc. without limitation；Mapping data in can only be stored with data of all categories with And the concrete form of corresponding Huffman tree, it also can store each data type, classification foundation and corresponding Huffman tree Storage address etc., to facilitate checking for data, without limitation to the data type specifically stored in mapping data at this.

Each Huffman tree generate when system can one Huffman of default allocation unique identification, to make It is effectively distinguished when carrying out compressed data and decompression data with the Huffman tree with other Huffman trees, is breathed out to improve The acquisition efficiency of Huffman tree of all categories in Fu Man tree use process, it is preferable that can be stored in mapping data it is of all categories with And corresponding Huffman tree mark, mapping data can store in easy-to-look-up front position, Huffman tree can store in In back-end data base, Huffman tree can not only be reduced to the greatest extent preceding by the matching acquisition that Huffman identifies progress Huffman tree The excessive EMS memory occupation in search procedure is held, and the efficiency for extracting after determining data category corresponding Huffman tree is higher, The application scenarios that perfect can be suitable for a classification simultaneously and correspond to more than one Huffman tree, then mapping according to the pre-stored data Data determine the process of the corresponding Huffman tree of data category specifically includes the following steps:

Step 1: the corresponding Huffman tree mark of data category is obtained in mapping data, obtains designated identification；

Step 2: extraction is identified to each Huffman tree in Huffman tree memory space, obtains the mark of each Huffman tree Know；

Step 3: matching according to mark of the designated identification to each Huffman tree, and the Huffman tree of successful match is made For the corresponding Huffman tree of data category.

Huffman tree is used before storing data when using the compressed data of Huffman tree in system Otherwise unique identification may with being decompressed to obtain correct data using corresponding Huffman tree when decompressing data The data of precocity compression can not decompress.

The following table 1 show a kind of mapping data signal, mainly includes data category, each data in mapping table shown in the following table 1 The storage address of the corresponding Huffman tree label of classification and the corresponding real Huffman tree of each Huffman tree label, herein only It is exemplified by Table 1, other mapping data modes can refer to the introduction of the present embodiment.

Data category	Video data	Lteral data	Image data	Compress bag data
					Huffman tree	Huffman tree 1	Huffman tree 2	Huffman tree 3	Huffman tree 4
Storage address	0x00002045	0x00002057	0x00002063	0x00002067

Table 1

In addition, Huffman tree of all categories can be obtained only by big data analysis；Big data analysis can also be obtained Huffman tree as default Huffman tree, while can receive Huffman tree that user specifies as more higher than default level The Huffman tree used can be uploaded and be downloaded by Huffman tree, specifically, user, can also upload the Huffman of oneself Set the use that (needing exist for certain data format) is used for default Huffman tree；Or server can also be according to being recorded Personal user used in Huffman tree, the habit of personal storing data distributes several Huffman trees for user, according to The when used Huffman tree of the distribution data file compression of the data intelligence of the wanted encoding and decoding of user, default priority are wanted Huffman tree etc. is set less than user oneself.It occupy above-mentioned tree and analyzes to obtain Huffman tree method of determination this reality outside Huffman tree It applies in example without limitation, can be set according to use demand.

Wherein, when only by big data analysis carrying out the construction of mapping data, mapping the construction process of data specifically can be with Include: that data are subjected to category division according to default classifying rules, obtains data of all categories；Big data is carried out to data of all categories Signature analysis obtains the data rule of data of all categories；According to data rule to data configuration Huffman tree of all categories, write from memory Recognize Huffman tree；Data of all categories and corresponding default Huffman tree are counted, mapping data are obtained.To it is all kinds of by data into Row feature extraction, according to classification common trait carry out Huffman tree construction after, by of all categories and corresponding Huffman tree into Row statistics storage, generates mapping data.

Preferably, it on the basis of the Huffman tree obtained by big data analysis, can further be uploaded using user Huffman tree, with adapt to different user be directed to different data different encoding and decoding requirements, the construction process for then mapping data Specifically includes the following steps:

Default Huffman tree is set as the first priority；

The Huffman tree that user uploads is set as the second priority；Wherein, the second priority is higher than the first priority；

The corresponding Huffman tree mark of data category is then obtained in mapping data, obtains designated identification specifically: screening The Huffman tree of highest priority in data of all categories.

Wherein, it is specified can to carry out data for the Huffman tree of the second priority, for example, specify some, certain data or certain The data of a little types carry out encoding and decoding etc. using the Huffman tree, herein only with above-mentioned two ways to the construction for mapping data Journey is introduced, and the construction process of the mapping data including other types of Huffman tree can refer to above-mentioned introduction, herein not It is repeating.

Step s130, encoding and decoding are carried out to specified data according to specified Huffman tree.

It obtains currently after the corresponding Huffman tree of codec data, carries out corresponding encoding and decoding using the Huffman tree Journey.For example, directly being compressed using existing Huffman tree when data compression.Which reduces in compression process In because construction Huffman tree caused by postpone.

By taking the general Huffman tree of a classification as an example, the present invention in code encoding/decoding mode, relative to existing one The form that the corresponding Huffman tree of data packet carries out encoding and decoding reduces n-1 for the data class containing n data packet The construction and maintenance process of Huffman tree greatly reduce and the occupied money of Huffman tree are constructed and safeguarded in data compression process Source.

It should be noted that the corresponding Huffman tree of a classification generally can be with only one in the present invention, but do not limit It is one, as long as less than the summation of data packet in the category, with the corresponding Huffman tree of a classification in the present embodiment For.

Decoding method provided in this embodiment can be applied to many industry storage scene or specific occasion and (answer With Huffman) storage in, construct Huffman encoding tree using big data, greatly promote compression speed.

Based on above-mentioned introduction, data file compression method provided in this embodiment is gone according to the data of existing big data Data characteristics classification is carried out according to the data of storage, extracts the overall data feature of different classes of data, such as can be according to row Industry, file format etc. analyze different industries data or the corresponding data characteristics of different File Format Datas as class condition, Corresponding Huffman tree of all categories is constructed using the building method of Huffman tree, carrying out construction according to overall data feature can be with Guarantee the magnitude of data compression ratio；The corresponding Huffman tree according to category latent structure of one classification, the Huffman tree are applicable in All data under the category, the i.e. encoding and decoding of category data can be carried out by the Huffman tree, so as to avoid one The high construction and maintenance expenditure that the corresponding Huffman tree of data packet generates in data encoding-decoding process, therefore can be with The rate of data file compression is effectively improved in the case where guaranteeing certain compression ratio.

Based on the above embodiment, the meeting of server data of all categories in the process of running, which constantly changes, (for example increases, subtracts Less, specific data variation etc.), to guarantee that corresponding Huffman tree of all categories is constantly in optimum state, to guarantee that data are compiled Decoding efficiency, it is preferable that the update and maintenance of Huffman tree can be carried out at any time, it specifically, can be when number in first category According to amplitude of variation be more than corresponding amplitude of variation threshold value when, according to the data situation of first category to the Kazakhstan of first category Fu Man tree carries out maintenance optimization, the corresponding Huffman tree of first category after being optimized.Wherein, first category can refer to arbitrarily Classification.

The step of specific maintenance optimization, without limitation, is referred to the maintenance optimization process of existing Huffman tree, than The personal Huffman tree of maintenance can be such as carried out according to the data of different user used, and can re-start the big of categorical data Data law-analysing and the construction process of corresponding Huffman tree etc..

After Huffman tree after being optimized, the encoding and decoding of data are carried out according to the Huffman tree after optimization, in order into one Step reduces the idle of the data for having carried out coding compression before Huffman tree optimization and occupies, can be when local system is idle Time is reused more preferably Huffman tree to compressed data and is re-compressed to obtain better compression ratio, and And newest Huffman tree is uploaded.Wherein, the determination method of system free time can according to system type, Host Type with And the factors such as working condition, encoding and decoding effect requirements determine, it is not limited here.

Referring to FIG. 2, Fig. 2 is the structural block diagram of data file compression device provided in this embodiment；It may include: data Classification determination unit 210, Huffman tree determination unit 220 and codec unit 230.Data file pressure provided in this embodiment Compression apparatus can mutually be compareed with above-mentioned data file compression method.

Wherein, data category determination unit 210 is mainly used for when receiving data file compression request, according to preset Data classification rule is determined to classification belonging to codec data, obtains data category；

Huffman tree determination unit 220 is mainly used for mapping data according to the pre-stored data and determines the corresponding Kazakhstan of data category Fu Man tree obtains specified Huffman tree；Wherein, it maps and is stored with each data category and corresponding Kazakhstan under big data analysis in data The mapping relations of Fu Man tree；

Codec unit 230 is mainly used for carrying out encoding and decoding to specified data according to specified Huffman tree.

Data file compression device provided in this embodiment is carried out by the Huffman tree of all categories that big data analysis obtains The encoding and decoding of data of all categories also have certain guarantee for the compression ratio of data while improving rate.

Wherein, Huffman tree determination unit can specifically include:

Mark determination unit to be obtained is obtained for obtaining the corresponding Huffman tree mark of data category in mapping data To designated identification；

Matching unit is identified, for matching according to mark of the designated identification to each Huffman tree, by successful match Huffman tree is as the corresponding Huffman tree of data category.

Data file compression device provided in this embodiment can be with further include: maintenance optimization unit, for working as first category When the amplitude of variation of interior data is more than corresponding amplitude of variation threshold value, according to the data situation of first category to first category Huffman tree carry out maintenance optimization, the Huffman tree after obtaining the corresponding optimization of first category.

Data file compression device provided in this embodiment can be with further include: weight compression unit, for when local system it is empty Compressed data in first category is re-started compression according to the Huffman tree after optimization by idle.

The present embodiment provides a kind of apparatus for compressing data, comprising: memory and processor.

Wherein, memory is for storing program；

It realizes when processor is for executing program such as the step of data file compression method, specifically can refer to above-mentioned data text The introduction of part compression method.

The present embodiment discloses a kind of readable storage medium storing program for executing, is stored thereon with program, realizes such as when program is executed by processor The step of data file compression method, specifically can refer to the introduction of above-mentioned data file compression method.

Referring to FIG. 3, being the structural schematic diagram of apparatus for compressing data provided in this embodiment, the data file compression Equipment can generate bigger difference because configuration or performance are different, may include one or more processors (central Processing units, CPU) 322 (for example, one or more processors) and memory 332, one or more Store the storage medium 330 (such as one or more mass memory units) of application program 342 or data 344.Wherein, it deposits Reservoir 332 and storage medium 330 can be of short duration storage or persistent storage.The program for being stored in storage medium 330 may include One or more modules (diagram does not mark), each module may include to the series of instructions behaviour in data processing equipment Make.Further, central processing unit 322 can be set to communicate with storage medium 330, in apparatus for compressing data 301 The upper series of instructions operation executed in storage medium 330.

Apparatus for compressing data 301 can also include one or more power supplys 326, one or more are wired Or radio network interface 350, one or more input/output interfaces 358, and/or, one or more operating systems 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

Step in data file compression method described in above figure 1 can be real by the structure of apparatus for compressing data It is existing.

Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.

Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.

Data file compression method, apparatus, equipment and readable storage medium storing program for executing provided by the present invention have been carried out in detail above It is thin to introduce.Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention.It should be pointed out that for the ordinary skill of the art , without departing from the principle of the present invention, can be with several improvements and modifications are made to the present invention for personnel, these improvement It is also fallen within the protection scope of the claims of the present invention with modification.

Claims

1. a kind of data file compression method is based on Huffman encoding mode characterized by comprising

When receive data file compression request when, according to preset data classification rule determine to codec data belonging to class Not, data category is obtained；

Mapping data according to the pre-stored data determine the corresponding Huffman tree of the data category, obtain specified Huffman tree；Its In, the mapping relations of each data category and corresponding Huffman tree under big data analysis are stored in the mapping data；

2. data file compression method as described in claim 1, which is characterized in that the mapping data according to the pre-stored data Determine the corresponding Huffman tree of the data category, comprising:

It is matched according to mark of the designated identification to each Huffman tree, using the Huffman tree of successful match as institute State the corresponding Huffman tree of data category.

3. data file compression method as claimed in claim 2, which is characterized in that the building method packet of the mapping data It includes:

Big data analysis is carried out to current data, and carries out the construction of Huffman tree of all categories based on the analysis results, is defaulted Huffman tree；

The default Huffman tree is set as the first priority；

The corresponding Huffman tree mark of the data category is then obtained in the mapping data, obtains designated identification specifically: Screen the Huffman tree of highest priority in data of all categories.

4. data file compression method as described in claim 1, which is characterized in that further include:

When the amplitude of variation of data in first category is more than corresponding amplitude of variation threshold value, according to the data of the first category Situation of change carries out maintenance optimization to the Huffman tree of the first category, the Kazakhstan after obtaining the corresponding optimization of the first category Fu Man tree.

5. data file compression method as claimed in claim 4, which is characterized in that further include:

When the local system free time, again according to the Huffman tree after the optimization by the compressed data in the first category It is compressed.

6. a kind of data file compression device is based on Huffman encoding mode characterized by comprising

Data category determination unit, for when receiving data file compression request, really according to preset data classification rule Determine to obtain data category to classification belonging to codec data；

Huffman tree determination unit determines the corresponding Huffman of the data category for mapping data according to the pre-stored data Tree obtains specified Huffman tree；Wherein, each data category and corresponding Kazakhstan under big data analysis are stored in the mapping data The mapping relations of Fu Man tree；

7. data file compression device as claimed in claim 6, which is characterized in that the Huffman tree determination unit includes:

Mark determination unit to be obtained, for obtaining the corresponding Huffman tree mark of the data category in the mapping data Know, obtains designated identification；

It sets marker extraction unit and obtains each Hough for being identified extraction to each Huffman tree in Huffman tree memory space The mark of Man Shu；

Identify matching unit, for being matched according to mark of the designated identification to each Huffman tree, will matching at The Huffman tree of function is as the corresponding Huffman tree of the data category.

8. data file compression device as claimed in claim 6, which is characterized in that further include: maintenance optimization unit, for working as When the amplitude of variation of data is more than corresponding amplitude of variation threshold value in first category, according to the data variation feelings of the first category Condition carries out maintenance optimization to the Huffman tree of the first category, the Huffman after obtaining the corresponding optimization of the first category Tree.

9. a kind of apparatus for compressing data characterized by comprising

Memory, for storing program；

Processor realizes the step of the data file compression method as described in any one of claim 1 to 5 when for executing described program Suddenly.

10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with program on the readable storage medium storing program for executing, described program is located It manages and is realized when device executes as described in any one of claim 1 to 5 the step of data file compression method.