CN110008192A - A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing - Google Patents
A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN110008192A CN110008192A CN201910295185.6A CN201910295185A CN110008192A CN 110008192 A CN110008192 A CN 110008192A CN 201910295185 A CN201910295185 A CN 201910295185A CN 110008192 A CN110008192 A CN 110008192A
- Authority
- CN
- China
- Prior art keywords
- data
- huffman tree
- category
- tree
- huffman
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a kind of data file compression methods, are related to data processing field, comprising: when receive data file compression request when, according to preset data classification rule determine to codec data belonging to classification, obtain data category;Mapping data according to the pre-stored data determine the corresponding Huffman tree of data category, obtain specified Huffman tree;Wherein, the mapping relations that each data category and corresponding Huffman tree under big data analysis are stored in data are mapped;Encoding and decoding are carried out to specified data according to specified Huffman tree.This method carries out the encoding and decoding of data of all categories by the Huffman tree of all categories that big data analysis obtains, and also has certain guarantee for the compression ratio of data while improving rate;The invention also discloses a kind of data file compression device, equipment and a kind of readable storage medium storing program for executing, have above-mentioned beneficial effect.
Description
Technical field
The present invention relates to data processing field, in particular to a kind of data file compression method, apparatus, equipment and one kind can
Read storage medium.
Background technique
Huffman encoding is to be widely used for the highly effective coding method of data file compression.Its compression ratio usually exists
Between 20%~90%, present Huffman is passed through since compression ratio is high, performance is excellent frequently as compression coding mode, for example, static
Huffman compression and the compression of dynamic Huffman.Huffman compression & decompression requires Huffman tree, and Huffman tree is also known as most
Excellent binary tree is a kind of shortest binary tree of cum rights path length.The cum rights path length of so-called tree is exactly leaf all in tree
The weight of node is multiplied by its path length for arriving root node, and (if root node is 0 layer, the path length of leaf node to root node is leaf
The number of plies of node).
Need to construct and safeguard Huffman encoding tree, Huffman in using Huffman encoding compression also decompression process
The construction process needs of tree analyze the data in each data packet, the corresponding Huffman tree of a data packet, in structure
Make complete carry out accordingly encode after also need carry out Huffman tree maintenance, until the data delete, using Huffman tree into
The a large amount of resource of process generation can not be returned due to defeated construction process and in the compression of row codec data and decompression process
It occupies, certain delay can be also caused to encoding-decoding process.
Therefore, the efficiency for how improving data file compression process is that those skilled in the art need the technology solved to ask
Topic.
Summary of the invention
The object of the present invention is to provide a kind of data file compression method, this method is obtained all kinds of by big data analysis
Other Huffman tree carries out the encoding and decoding of data of all categories, and compression ratio while improving rate for data also has certain guarantor
Card;It is a further object of the present invention to provide a kind of data file compression device, equipment and a kind of readable storage medium storing program for executing.
In order to solve the above technical problems, the present invention provides a kind of data file compression method, it is based on Huffman encoding mode,
Include:
When receiving data file compression request, determined according to preset data classification rule to belonging to codec data
Classification, obtain data category;
Mapping data according to the pre-stored data determine the corresponding Huffman tree of the data category, obtain specified Huffman
Tree;Wherein, the mapping relations of each data category and corresponding Huffman tree under big data analysis are stored in the mapping data;
Encoding and decoding are carried out to the specified data according to the specified Huffman tree.
Preferably, the mapping data according to the pre-stored data determine the corresponding Huffman tree of the data category, comprising:
The corresponding Huffman tree mark of the data category is obtained in the mapping data, obtains designated identification;
Extraction is identified to each Huffman tree in Huffman tree memory space, obtains the mark of each Huffman tree;
It is matched according to mark of the designated identification to each Huffman tree, the Huffman tree of successful match is made
For the corresponding Huffman tree of the data category.
Preferably, the building method of the mapping data includes:
Big data analysis is carried out to current data, and carries out the construction of Huffman tree of all categories based on the analysis results, is obtained
Default Huffman tree;
The default Huffman tree is set as the first priority;
When receiving Huffman tree upload request, the Huffman tree that user uploads is received;
The Huffman tree that the user uploads is set as the second priority;Wherein, second priority is higher than described
First priority;
The corresponding Huffman tree mark of the data category is then obtained in the mapping data, and it is specific to obtain designated identification
Are as follows: screen the Huffman tree of highest priority in data of all categories.
Preferably, the data file compression method further include:
When the amplitude of variation of data in first category is more than corresponding amplitude of variation threshold value, according to the first category
Data situation carries out maintenance optimization to the Huffman tree of the first category, after obtaining the corresponding optimization of the first category
Huffman tree.
Preferably, the data file compression method further include:
When the local system free time, by the compressed data in the first category according to the Huffman tree after the optimization
Re-start compression.
The present invention discloses a kind of data file compression device, is based on Huffman encoding mode, comprising:
Data category determination unit, for being advised according to preset data classification when receiving data file compression request
It then determines to classification belonging to codec data, obtains data category;
Huffman tree determination unit determines the corresponding Hough of the data category for mapping data according to the pre-stored data
Man Shu obtains specified Huffman tree;Wherein, be stored in the mapping data under big data analysis each data category with it is corresponding
The mapping relations of Huffman tree;
Codec unit, for carrying out encoding and decoding to the specified data according to the specified Huffman tree.
Preferably, the Huffman tree determination unit includes:
Mark determination unit to be obtained, for obtaining the corresponding Huffman tree of the data category in the mapping data
Mark, obtains designated identification;
Marker extraction unit is set, for being identified extraction to each Huffman tree in Huffman tree memory space, is obtained each
The mark of Huffman tree;
Matching unit is identified, for being matched according to mark of the designated identification to each Huffman tree, general
With successful Huffman tree as the corresponding Huffman tree of the data category.
Preferably, the data file compression device further include: maintenance optimization unit, for when data in first category
When amplitude of variation is more than corresponding amplitude of variation threshold value, according to the data situation of the first category to the first category
Huffman tree carry out maintenance optimization, the Huffman tree after obtaining the corresponding optimization of the first category.
The present invention discloses a kind of apparatus for compressing data, comprising:
Memory, for storing program;
Processor, the step of data file compression method is realized when for executing described program.
The present invention discloses a kind of readable storage medium storing program for executing, and program is stored on the readable storage medium storing program for executing, and described program is located
The step of reason device realizes the data file compression method when executing.
Data file compression method provided by the present invention goes the data according to storage according to the data of existing big data
Data characteristics classification is carried out, extracts the overall data feature of different classes of data, for example can make according to industry, file format etc.
Different industries data or the corresponding data characteristics of different File Format Datas are analyzed for class condition, utilize the structure of Huffman tree
It makes method construct and goes out corresponding Huffman tree of all categories, data compression ratio can be guaranteed by carrying out construction according to overall data feature
Magnitude;The corresponding Huffman tree according to category latent structure of one classification, the Huffman tree are suitable for all numbers under the category
According to the i.e. encoding and decoding of category data can be carried out by the Huffman tree, so as to avoid the corresponding Kazakhstan of a data packet
The high construction and maintenance expenditure that Fu Man tree generates in data encoding-decoding process, therefore can guarantee certain compression
The rate of data file compression is effectively improved than in the case where.
The present invention also provides a kind of data file compression device, equipment and a kind of readable storage medium storing program for executing, have with above-mentioned
Beneficial effect, details are not described herein.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the flow chart of data file compression method provided in an embodiment of the present invention;
Fig. 2 is the structural block diagram of data file compression device provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of apparatus for compressing data provided in an embodiment of the present invention.
Specific embodiment
Core of the invention is to provide a kind of data file compression method, and this method can guarantee certain compression ratio
In the case of effectively improve the rate of data file compression;Another core of the invention is to provide a kind of data file compression dress
It sets, equipment and a kind of readable storage medium storing program for executing.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Any answering when data file compression method provided by the invention is suitable for using Huffman tree progress data processing
With scene, such as Huffman encoding, decoding, data compression, decompression etc., without limitation to concrete application scene at this.
Referring to FIG. 1, Fig. 1 is the flow chart of data file compression method provided in this embodiment;This method may include:
Step s110, it when receiving data file compression request, is determined according to preset data classification rule wait compile solution
Classification belonging to code data, obtains data category.
When receiving data file compression request, codec data is treated according to predetermined data classification rule first
Carry out the determination of classification.Wherein, data file compression request can refer to data compression or refer to data decompression, using Kazakhstan
Any scene that Fu Man tree carries out data compression process is applicable.In addition, not done in the present invention to data category division foundation
It limits, can be divided according to industry, for example be divided into medical data, IT data, finance data etc.;It can also be according to data
Format is divided, such as video data, lteral data, image data, compression bag data etc.;Habit can also be used according to user
It is used to be divided, for example be divided into user A data, user's B data, user's C data etc..
Due in the above process to data category division according to without limitation, data method of determination is related to the type of classification
Connection, the corresponding data category method of determination of different classes of type is generally also different, therefore asks when receiving data file compression
When asking, the classification method of determination of decoding data is treated without limitation.It, can such as when data are carried out category division according to industry
To distinguish trade information according to keyword, or preset data can also be extracted by extracting the keyword in data
The industry label of packet, or determined according to the file name to codec data, wherein since construction Huffman tree needs
Binary form is converted data to, then key message extraction process need to be realized before Binary Conversion;When data are according to format
, then can be by the suffix information of extraction document when carrying out category division, or it can also be by extracting format tags etc., herein
The introduction of data category method of determination is only carried out by taking above situation as an example, other way can refer to the introduction of the present embodiment.
Big data analysis is carried out in the present invention, is carried out using the existing inherent law of different data parlor in the same category
There is the data rule being closer between the same category data, go according to the data of existing big data in the extraction of Huffman tree
Corresponding Huffman tree is extracted according to the data binary data of storage, can both guarantee data encoding-decoding efficiency, for one
The data (for example including n data packet) of classification are used uniformly one or x (x is less than n) Huffman tree and carry out compiling solution accordingly
Code, i.e., multiple general Huffman trees of data packet greatly reduce the construction and maintenance cost of Huffman tree.Wherein, according to
The construction process that the feature of extraction carries out Huffman tree can refer to the constitution step of existing Huffman tree, and details are not described herein.
The construction process of Huffman tree of all categories is generally completed during carrying out big data analysis to data of all categories,
It can be called directly in later data encoding-decoding process, the construction of data analysis and Huffman tree is carried out without scene,
Reduce the time cost and resource occupation cost of this process, wherein according to data characteristics memory Huffman tree of all categories
The specific steps of construction are referred to the construction process of existing Huffman tree, and the extraction process of Huffman tree may include following
Step:
Character is ranked into vector P by weight size first;2. every time merge after point push_back to vector Q because
Latter incorporated point is centainly less than the point first merged, so be also ordered into Q;3. so every time relatively P, the header element of Q can
To extract two the smallest point, merge.
The live construction process of Huffman tree of all categories is carried out in carrying out data encoding-decoding process, contains n due to one
The classification of a data packet need to only construct one or x (x is less than n) Huffman tree, need data packet and Huffman relative to current
Tree corresponds, that is, the process for constructing n Huffman tree decreases n-1 or the resource of n-x Huffman tree construction process accounts for
With.
In addition, data may constantly change in encoding-decoding process, such as data growth or reduction etc., construct in advance
Corresponding Huffman tree of all categories can the updating maintenance at any time in data updating process so that Huffman tree is in optimal shape
State guarantees encoding-decoding efficiency, in the present embodiment without limitation to the maintenance process of Huffman tree.
Step s120, mapping data according to the pre-stored data determine the corresponding Huffman tree of data category, obtain specified Kazakhstan
Fu Man tree.
The mapping relations of each data category with corresponding Huffman tree are stored in mapping data, specific mapping data
Form can be form or document form etc. without limitation;Mapping data in can only be stored with data of all categories with
And the concrete form of corresponding Huffman tree, it also can store each data type, classification foundation and corresponding Huffman tree
Storage address etc., to facilitate checking for data, without limitation to the data type specifically stored in mapping data at this.
Each Huffman tree generate when system can one Huffman of default allocation unique identification, to make
It is effectively distinguished when carrying out compressed data and decompression data with the Huffman tree with other Huffman trees, is breathed out to improve
The acquisition efficiency of Huffman tree of all categories in Fu Man tree use process, it is preferable that can be stored in mapping data it is of all categories with
And corresponding Huffman tree mark, mapping data can store in easy-to-look-up front position, Huffman tree can store in
In back-end data base, Huffman tree can not only be reduced to the greatest extent preceding by the matching acquisition that Huffman identifies progress Huffman tree
The excessive EMS memory occupation in search procedure is held, and the efficiency for extracting after determining data category corresponding Huffman tree is higher,
The application scenarios that perfect can be suitable for a classification simultaneously and correspond to more than one Huffman tree, then mapping according to the pre-stored data
Data determine the process of the corresponding Huffman tree of data category specifically includes the following steps:
Step 1: the corresponding Huffman tree mark of data category is obtained in mapping data, obtains designated identification;
Step 2: extraction is identified to each Huffman tree in Huffman tree memory space, obtains the mark of each Huffman tree
Know;
Step 3: matching according to mark of the designated identification to each Huffman tree, and the Huffman tree of successful match is made
For the corresponding Huffman tree of data category.
Huffman tree is used before storing data when using the compressed data of Huffman tree in system
Otherwise unique identification may with being decompressed to obtain correct data using corresponding Huffman tree when decompressing data
The data of precocity compression can not decompress.
The following table 1 show a kind of mapping data signal, mainly includes data category, each data in mapping table shown in the following table 1
The storage address of the corresponding Huffman tree label of classification and the corresponding real Huffman tree of each Huffman tree label, herein only
It is exemplified by Table 1, other mapping data modes can refer to the introduction of the present embodiment.
Data category | Video data | Lteral data | Image data | Compress bag data |
Huffman tree | Huffman tree 1 | Huffman tree 2 | Huffman tree 3 | Huffman tree 4 |
Storage address | 0x00002045 | 0x00002057 | 0x00002063 | 0x00002067 |
Table 1
In addition, Huffman tree of all categories can be obtained only by big data analysis;Big data analysis can also be obtained
Huffman tree as default Huffman tree, while can receive Huffman tree that user specifies as more higher than default level
The Huffman tree used can be uploaded and be downloaded by Huffman tree, specifically, user, can also upload the Huffman of oneself
Set the use that (needing exist for certain data format) is used for default Huffman tree;Or server can also be according to being recorded
Personal user used in Huffman tree, the habit of personal storing data distributes several Huffman trees for user, according to
The when used Huffman tree of the distribution data file compression of the data intelligence of the wanted encoding and decoding of user, default priority are wanted
Huffman tree etc. is set less than user oneself.It occupy above-mentioned tree and analyzes to obtain Huffman tree method of determination this reality outside Huffman tree
It applies in example without limitation, can be set according to use demand.
Wherein, when only by big data analysis carrying out the construction of mapping data, mapping the construction process of data specifically can be with
Include: that data are subjected to category division according to default classifying rules, obtains data of all categories;Big data is carried out to data of all categories
Signature analysis obtains the data rule of data of all categories;According to data rule to data configuration Huffman tree of all categories, write from memory
Recognize Huffman tree;Data of all categories and corresponding default Huffman tree are counted, mapping data are obtained.To it is all kinds of by data into
Row feature extraction, according to classification common trait carry out Huffman tree construction after, by of all categories and corresponding Huffman tree into
Row statistics storage, generates mapping data.
Preferably, it on the basis of the Huffman tree obtained by big data analysis, can further be uploaded using user
Huffman tree, with adapt to different user be directed to different data different encoding and decoding requirements, the construction process for then mapping data
Specifically includes the following steps:
Big data analysis is carried out to current data, and carries out the construction of Huffman tree of all categories based on the analysis results, is obtained
Default Huffman tree;
Default Huffman tree is set as the first priority;
When receiving Huffman tree upload request, the Huffman tree that user uploads is received;
The Huffman tree that user uploads is set as the second priority;Wherein, the second priority is higher than the first priority;
The corresponding Huffman tree mark of data category is then obtained in mapping data, obtains designated identification specifically: screening
The Huffman tree of highest priority in data of all categories.
Wherein, it is specified can to carry out data for the Huffman tree of the second priority, for example, specify some, certain data or certain
The data of a little types carry out encoding and decoding etc. using the Huffman tree, herein only with above-mentioned two ways to the construction for mapping data
Journey is introduced, and the construction process of the mapping data including other types of Huffman tree can refer to above-mentioned introduction, herein not
It is repeating.
Step s130, encoding and decoding are carried out to specified data according to specified Huffman tree.
It obtains currently after the corresponding Huffman tree of codec data, carries out corresponding encoding and decoding using the Huffman tree
Journey.For example, directly being compressed using existing Huffman tree when data compression.Which reduces in compression process
In because construction Huffman tree caused by postpone.
By taking the general Huffman tree of a classification as an example, the present invention in code encoding/decoding mode, relative to existing one
The form that the corresponding Huffman tree of data packet carries out encoding and decoding reduces n-1 for the data class containing n data packet
The construction and maintenance process of Huffman tree greatly reduce and the occupied money of Huffman tree are constructed and safeguarded in data compression process
Source.
It should be noted that the corresponding Huffman tree of a classification generally can be with only one in the present invention, but do not limit
It is one, as long as less than the summation of data packet in the category, with the corresponding Huffman tree of a classification in the present embodiment
For.
Decoding method provided in this embodiment can be applied to many industry storage scene or specific occasion and (answer
With Huffman) storage in, construct Huffman encoding tree using big data, greatly promote compression speed.
Based on above-mentioned introduction, data file compression method provided in this embodiment is gone according to the data of existing big data
Data characteristics classification is carried out according to the data of storage, extracts the overall data feature of different classes of data, such as can be according to row
Industry, file format etc. analyze different industries data or the corresponding data characteristics of different File Format Datas as class condition,
Corresponding Huffman tree of all categories is constructed using the building method of Huffman tree, carrying out construction according to overall data feature can be with
Guarantee the magnitude of data compression ratio;The corresponding Huffman tree according to category latent structure of one classification, the Huffman tree are applicable in
All data under the category, the i.e. encoding and decoding of category data can be carried out by the Huffman tree, so as to avoid one
The high construction and maintenance expenditure that the corresponding Huffman tree of data packet generates in data encoding-decoding process, therefore can be with
The rate of data file compression is effectively improved in the case where guaranteeing certain compression ratio.
Based on the above embodiment, the meeting of server data of all categories in the process of running, which constantly changes, (for example increases, subtracts
Less, specific data variation etc.), to guarantee that corresponding Huffman tree of all categories is constantly in optimum state, to guarantee that data are compiled
Decoding efficiency, it is preferable that the update and maintenance of Huffman tree can be carried out at any time, it specifically, can be when number in first category
According to amplitude of variation be more than corresponding amplitude of variation threshold value when, according to the data situation of first category to the Kazakhstan of first category
Fu Man tree carries out maintenance optimization, the corresponding Huffman tree of first category after being optimized.Wherein, first category can refer to arbitrarily
Classification.
The step of specific maintenance optimization, without limitation, is referred to the maintenance optimization process of existing Huffman tree, than
The personal Huffman tree of maintenance can be such as carried out according to the data of different user used, and can re-start the big of categorical data
Data law-analysing and the construction process of corresponding Huffman tree etc..
After Huffman tree after being optimized, the encoding and decoding of data are carried out according to the Huffman tree after optimization, in order into one
Step reduces the idle of the data for having carried out coding compression before Huffman tree optimization and occupies, can be when local system is idle
Time is reused more preferably Huffman tree to compressed data and is re-compressed to obtain better compression ratio, and
And newest Huffman tree is uploaded.Wherein, the determination method of system free time can according to system type, Host Type with
And the factors such as working condition, encoding and decoding effect requirements determine, it is not limited here.
Referring to FIG. 2, Fig. 2 is the structural block diagram of data file compression device provided in this embodiment;It may include: data
Classification determination unit 210, Huffman tree determination unit 220 and codec unit 230.Data file pressure provided in this embodiment
Compression apparatus can mutually be compareed with above-mentioned data file compression method.
Wherein, data category determination unit 210 is mainly used for when receiving data file compression request, according to preset
Data classification rule is determined to classification belonging to codec data, obtains data category;
Huffman tree determination unit 220 is mainly used for mapping data according to the pre-stored data and determines the corresponding Kazakhstan of data category
Fu Man tree obtains specified Huffman tree;Wherein, it maps and is stored with each data category and corresponding Kazakhstan under big data analysis in data
The mapping relations of Fu Man tree;
Codec unit 230 is mainly used for carrying out encoding and decoding to specified data according to specified Huffman tree.
Data file compression device provided in this embodiment is carried out by the Huffman tree of all categories that big data analysis obtains
The encoding and decoding of data of all categories also have certain guarantee for the compression ratio of data while improving rate.
Wherein, Huffman tree determination unit can specifically include:
Mark determination unit to be obtained is obtained for obtaining the corresponding Huffman tree mark of data category in mapping data
To designated identification;
Marker extraction unit is set, for being identified extraction to each Huffman tree in Huffman tree memory space, is obtained each
The mark of Huffman tree;
Matching unit is identified, for matching according to mark of the designated identification to each Huffman tree, by successful match
Huffman tree is as the corresponding Huffman tree of data category.
Data file compression device provided in this embodiment can be with further include: maintenance optimization unit, for working as first category
When the amplitude of variation of interior data is more than corresponding amplitude of variation threshold value, according to the data situation of first category to first category
Huffman tree carry out maintenance optimization, the Huffman tree after obtaining the corresponding optimization of first category.
Data file compression device provided in this embodiment can be with further include: weight compression unit, for when local system it is empty
Compressed data in first category is re-started compression according to the Huffman tree after optimization by idle.
The present embodiment provides a kind of apparatus for compressing data, comprising: memory and processor.
Wherein, memory is for storing program;
It realizes when processor is for executing program such as the step of data file compression method, specifically can refer to above-mentioned data text
The introduction of part compression method.
The present embodiment discloses a kind of readable storage medium storing program for executing, is stored thereon with program, realizes such as when program is executed by processor
The step of data file compression method, specifically can refer to the introduction of above-mentioned data file compression method.
Referring to FIG. 3, being the structural schematic diagram of apparatus for compressing data provided in this embodiment, the data file compression
Equipment can generate bigger difference because configuration or performance are different, may include one or more processors (central
Processing units, CPU) 322 (for example, one or more processors) and memory 332, one or more
Store the storage medium 330 (such as one or more mass memory units) of application program 342 or data 344.Wherein, it deposits
Reservoir 332 and storage medium 330 can be of short duration storage or persistent storage.The program for being stored in storage medium 330 may include
One or more modules (diagram does not mark), each module may include to the series of instructions behaviour in data processing equipment
Make.Further, central processing unit 322 can be set to communicate with storage medium 330, in apparatus for compressing data 301
The upper series of instructions operation executed in storage medium 330.
Apparatus for compressing data 301 can also include one or more power supplys 326, one or more are wired
Or radio network interface 350, one or more input/output interfaces 358, and/or, one or more operating systems
341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in data file compression method described in above figure 1 can be real by the structure of apparatus for compressing data
It is existing.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities
The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration
?.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
Data file compression method, apparatus, equipment and readable storage medium storing program for executing provided by the present invention have been carried out in detail above
It is thin to introduce.Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention.It should be pointed out that for the ordinary skill of the art
, without departing from the principle of the present invention, can be with several improvements and modifications are made to the present invention for personnel, these improvement
It is also fallen within the protection scope of the claims of the present invention with modification.
Claims (10)
1. a kind of data file compression method is based on Huffman encoding mode characterized by comprising
When receive data file compression request when, according to preset data classification rule determine to codec data belonging to class
Not, data category is obtained;
Mapping data according to the pre-stored data determine the corresponding Huffman tree of the data category, obtain specified Huffman tree;Its
In, the mapping relations of each data category and corresponding Huffman tree under big data analysis are stored in the mapping data;
Encoding and decoding are carried out to the specified data according to the specified Huffman tree.
2. data file compression method as described in claim 1, which is characterized in that the mapping data according to the pre-stored data
Determine the corresponding Huffman tree of the data category, comprising:
The corresponding Huffman tree mark of the data category is obtained in the mapping data, obtains designated identification;
Extraction is identified to each Huffman tree in Huffman tree memory space, obtains the mark of each Huffman tree;
It is matched according to mark of the designated identification to each Huffman tree, using the Huffman tree of successful match as institute
State the corresponding Huffman tree of data category.
3. data file compression method as claimed in claim 2, which is characterized in that the building method packet of the mapping data
It includes:
Big data analysis is carried out to current data, and carries out the construction of Huffman tree of all categories based on the analysis results, is defaulted
Huffman tree;
The default Huffman tree is set as the first priority;
When receiving Huffman tree upload request, the Huffman tree that user uploads is received;
The Huffman tree that the user uploads is set as the second priority;Wherein, second priority is higher than described first
Priority;
The corresponding Huffman tree mark of the data category is then obtained in the mapping data, obtains designated identification specifically:
Screen the Huffman tree of highest priority in data of all categories.
4. data file compression method as described in claim 1, which is characterized in that further include:
When the amplitude of variation of data in first category is more than corresponding amplitude of variation threshold value, according to the data of the first category
Situation of change carries out maintenance optimization to the Huffman tree of the first category, the Kazakhstan after obtaining the corresponding optimization of the first category
Fu Man tree.
5. data file compression method as claimed in claim 4, which is characterized in that further include:
When the local system free time, again according to the Huffman tree after the optimization by the compressed data in the first category
It is compressed.
6. a kind of data file compression device is based on Huffman encoding mode characterized by comprising
Data category determination unit, for when receiving data file compression request, really according to preset data classification rule
Determine to obtain data category to classification belonging to codec data;
Huffman tree determination unit determines the corresponding Huffman of the data category for mapping data according to the pre-stored data
Tree obtains specified Huffman tree;Wherein, each data category and corresponding Kazakhstan under big data analysis are stored in the mapping data
The mapping relations of Fu Man tree;
Codec unit, for carrying out encoding and decoding to the specified data according to the specified Huffman tree.
7. data file compression device as claimed in claim 6, which is characterized in that the Huffman tree determination unit includes:
Mark determination unit to be obtained, for obtaining the corresponding Huffman tree mark of the data category in the mapping data
Know, obtains designated identification;
It sets marker extraction unit and obtains each Hough for being identified extraction to each Huffman tree in Huffman tree memory space
The mark of Man Shu;
Identify matching unit, for being matched according to mark of the designated identification to each Huffman tree, will matching at
The Huffman tree of function is as the corresponding Huffman tree of the data category.
8. data file compression device as claimed in claim 6, which is characterized in that further include: maintenance optimization unit, for working as
When the amplitude of variation of data is more than corresponding amplitude of variation threshold value in first category, according to the data variation feelings of the first category
Condition carries out maintenance optimization to the Huffman tree of the first category, the Huffman after obtaining the corresponding optimization of the first category
Tree.
9. a kind of apparatus for compressing data characterized by comprising
Memory, for storing program;
Processor realizes the step of the data file compression method as described in any one of claim 1 to 5 when for executing described program
Suddenly.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with program on the readable storage medium storing program for executing, described program is located
It manages and is realized when device executes as described in any one of claim 1 to 5 the step of data file compression method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910295185.6A CN110008192A (en) | 2019-04-12 | 2019-04-12 | A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910295185.6A CN110008192A (en) | 2019-04-12 | 2019-04-12 | A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110008192A true CN110008192A (en) | 2019-07-12 |
Family
ID=67171491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910295185.6A Pending CN110008192A (en) | 2019-04-12 | 2019-04-12 | A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110008192A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111147861A (en) * | 2020-01-02 | 2020-05-12 | 广州虎牙科技有限公司 | Image compression method, device, user equipment and computer readable storage medium |
CN112399479A (en) * | 2020-11-03 | 2021-02-23 | 广州机智云物联网科技有限公司 | Method, electronic device and storage medium for data transmission |
CN112417815A (en) * | 2020-11-18 | 2021-02-26 | 红有软件股份有限公司 | Dynamic coding method for category combined data in big data processing |
CN112580676A (en) * | 2019-09-29 | 2021-03-30 | 北京京东振世信息技术有限公司 | Clustering method, clustering device, computer readable medium and electronic device |
CN112948432A (en) * | 2019-12-11 | 2021-06-11 | 中国电信股份有限公司 | Data processing method and data processing device |
CN113282776A (en) * | 2021-07-12 | 2021-08-20 | 北京蔚领时代科技有限公司 | Data processing system for graphics engine resource file compression |
CN113708772A (en) * | 2021-08-11 | 2021-11-26 | 山东云海国创云计算装备产业创新中心有限公司 | Huffman coding method, system, device and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090045991A1 (en) * | 2007-08-15 | 2009-02-19 | Red Hat, Inc. | Alternative encoding for lzss output |
CN104978319A (en) * | 2014-04-02 | 2015-10-14 | 东华软件股份公司 | Method and equipment used for classified transmission of files |
CN106357275A (en) * | 2016-08-30 | 2017-01-25 | 国网冀北电力有限公司信息通信分公司 | Huffman compression method and device |
CN107257426A (en) * | 2017-06-19 | 2017-10-17 | 成都优孚达信息技术有限公司 | A kind of data compression method for reducing resource consumption |
CN107565971A (en) * | 2017-09-07 | 2018-01-09 | 华为技术有限公司 | A kind of data compression method and device |
-
2019
- 2019-04-12 CN CN201910295185.6A patent/CN110008192A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090045991A1 (en) * | 2007-08-15 | 2009-02-19 | Red Hat, Inc. | Alternative encoding for lzss output |
CN104978319A (en) * | 2014-04-02 | 2015-10-14 | 东华软件股份公司 | Method and equipment used for classified transmission of files |
CN106357275A (en) * | 2016-08-30 | 2017-01-25 | 国网冀北电力有限公司信息通信分公司 | Huffman compression method and device |
CN107257426A (en) * | 2017-06-19 | 2017-10-17 | 成都优孚达信息技术有限公司 | A kind of data compression method for reducing resource consumption |
CN107565971A (en) * | 2017-09-07 | 2018-01-09 | 华为技术有限公司 | A kind of data compression method and device |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112580676A (en) * | 2019-09-29 | 2021-03-30 | 北京京东振世信息技术有限公司 | Clustering method, clustering device, computer readable medium and electronic device |
CN112580676B (en) * | 2019-09-29 | 2024-08-20 | 北京京东振世信息技术有限公司 | Clustering method, clustering device, computer readable medium and electronic equipment |
CN112948432A (en) * | 2019-12-11 | 2021-06-11 | 中国电信股份有限公司 | Data processing method and data processing device |
CN112948432B (en) * | 2019-12-11 | 2023-10-13 | 天翼云科技有限公司 | Data processing method and data processing device |
CN111147861A (en) * | 2020-01-02 | 2020-05-12 | 广州虎牙科技有限公司 | Image compression method, device, user equipment and computer readable storage medium |
CN112399479A (en) * | 2020-11-03 | 2021-02-23 | 广州机智云物联网科技有限公司 | Method, electronic device and storage medium for data transmission |
CN112399479B (en) * | 2020-11-03 | 2023-04-07 | 广州机智云物联网科技有限公司 | Method, electronic device and storage medium for data transmission |
CN112417815A (en) * | 2020-11-18 | 2021-02-26 | 红有软件股份有限公司 | Dynamic coding method for category combined data in big data processing |
CN112417815B (en) * | 2020-11-18 | 2024-01-23 | 红有软件股份有限公司 | Dynamic coding method for class combination data in big data processing |
CN113282776A (en) * | 2021-07-12 | 2021-08-20 | 北京蔚领时代科技有限公司 | Data processing system for graphics engine resource file compression |
CN113708772A (en) * | 2021-08-11 | 2021-11-26 | 山东云海国创云计算装备产业创新中心有限公司 | Huffman coding method, system, device and readable storage medium |
CN113708772B (en) * | 2021-08-11 | 2024-07-26 | 山东云海国创云计算装备产业创新中心有限公司 | Huffman coding method, system, device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110008192A (en) | A kind of data file compression method, apparatus, equipment and readable storage medium storing program for executing | |
CN109034993A (en) | Account checking method, equipment, system and computer readable storage medium | |
US11144506B2 (en) | Compression of log data using field types | |
CN109325118B (en) | Unbalanced sample data preprocessing method and device and computer equipment | |
US11151089B2 (en) | Compression of log data using pattern recognition | |
CN106407442B (en) | A kind of mass text data processing method and device | |
CN111931809A (en) | Data processing method and device, storage medium and electronic equipment | |
CN117081602B (en) | Capital settlement data optimization processing method based on blockchain | |
CN112463784A (en) | Data deduplication method, device, equipment and computer readable storage medium | |
CN108880559B (en) | Data compression method, data decompression method, compression equipment and decompression equipment | |
CN115965058A (en) | Neural network training method, entity information classification method, device and storage medium | |
JP6645013B2 (en) | Encoding program, encoding method, encoding device, and decompression method | |
CN109379245A (en) | A kind of wifi report form generation method and system | |
CN109977977A (en) | A kind of method and corresponding intrument identifying potential user | |
CN110442489A (en) | The method and storage medium of data processing | |
CN105302915A (en) | High-performance data processing system based on memory calculation | |
CN112068812B (en) | Micro-service generation method and device, computer equipment and storage medium | |
CN111767419A (en) | Picture searching method, device, equipment and computer readable storage medium | |
CN111857723B (en) | Parameter compiling method and device and computer readable storage medium | |
CN105872731A (en) | Data processing method and device | |
CN110032432B (en) | Example compression method and device and example decompression method and device | |
CN111767280A (en) | Data processing method, device and storage medium | |
CN116760661A (en) | Data storage method, apparatus, computer device, storage medium, and program product | |
CN114781517A (en) | Risk identification method and device and terminal equipment | |
CN115982634A (en) | Application program classification method and device, electronic equipment and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190712 |
|
RJ01 | Rejection of invention patent application after publication |