CN115269940A - Data compression method of ERP management system - Google Patents

Data compression method of ERP management system Download PDF

Info

Publication number
CN115269940A
CN115269940A CN202211206424.4A CN202211206424A CN115269940A CN 115269940 A CN115269940 A CN 115269940A CN 202211206424 A CN202211206424 A CN 202211206424A CN 115269940 A CN115269940 A CN 115269940A
Authority
CN
China
Prior art keywords
data
compression
interval
entropy
reducible
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211206424.4A
Other languages
Chinese (zh)
Other versions
CN115269940B (en
Inventor
葛平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiazhuo Intelligent Technology Nantong Co ltd
Original Assignee
Jiazhuo Intelligent Technology Nantong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiazhuo Intelligent Technology Nantong Co ltd filed Critical Jiazhuo Intelligent Technology Nantong Co ltd
Priority to CN202211206424.4A priority Critical patent/CN115269940B/en
Publication of CN115269940A publication Critical patent/CN115269940A/en
Application granted granted Critical
Publication of CN115269940B publication Critical patent/CN115269940B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90348Query processing by searching ordered data, e.g. alpha-numerically ordered data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to the technical field of digital data processing, in particular to a data compression method of an ERP management system, which collects data to be compressed of the ERP management system; acquiring a plurality of compression intervals of data to be compressed and corresponding average repeatability; partitioning all data again to obtain a plurality of data intervals; screening out reducible entropy data and establishing a corresponding ideal compression model; acquiring corresponding position adjustment parameters and direction adjustment parameters based on the difference between the arrangement position of the reducible entropy data and the ideal compression model; adjusting the reducible entropy data by using corresponding position adjustment parameters and direction adjustment parameters according to different arrangement sequences, and selecting sequence adjustment parameters; and performing distribution adjustment on the reducible entropy data by using the position adjustment parameter, the direction adjustment parameter and the sequence adjustment parameter to obtain an adjusted entropy reduction model, and compressing the entropy reduction model. The invention can improve the compression efficiency and realize the high-efficiency compression of the ERP management system data.

Description

Data compression method of ERP management system
Technical Field
The invention relates to the technical field of digital data processing, in particular to a data compression method of an ERP management system.
Background
When an ERP management system manages information, a large amount of data is required to be supported, the data is compressed by a conventional method for reducing data transmission and large compression amount required by ERP, the most common algorithm for data compression in the prior art is a GZIP compression algorithm, lz77 codes are used for carrying out primary compression on the data, and then Huffman codes are used for carrying out secondary compression on the codes compressed by the lz77 codes, and the compression method is essentially that the data is coded and compressed by using the repeatability of the data to approach the limit information entropy.
Because the comprehensive consideration of the ERP management system results in a small amount of structured data in the data and a small repeatability of the data, that is, the information entropy of the whole data is too large, when the lz77 code is used for compressing the ERP management system data, the repeated data is often not in the same compression dictionary, or is in the same compression window but is relatively far away from the compression window, so that the information entropy of the whole data is too large, the compression efficiency is too low, and a good efficiency effect cannot be achieved.
Disclosure of Invention
In order to solve the problem that the compression effect is not ideal due to the overlarge information entropy of the ERP management system, the invention provides a data compression method of the ERP management system, and the adopted technical scheme is as follows:
an embodiment of the present invention provides a data compression method for an ERP management system, including the following steps:
collecting data to be compressed of an ERP management system;
the method comprises the steps of performing interval division on data to be compressed to obtain a plurality of compression intervals, and obtaining corresponding average repeatability according to the information entropy of repeated data with different lengths in each compression interval; re-partitioning all data based on the average repeatability of all data in a plurality of continuous compression intervals to obtain a plurality of data intervals;
acquiring the distribution characteristics of the repeated data in each data interval, taking the average value of all the distribution characteristics as a screening threshold value, and screening out the repeated data corresponding to the distribution characteristics larger than the screening threshold value as the reducible entropy data; establishing an ideal compression model of the reducible entropy data;
acquiring corresponding position adjustment parameters based on the difference between the arrangement position of the reducible entropy data and the ideal compression model; acquiring corresponding direction adjustment parameters based on the positive and negative of the accumulated value of the difference; adjusting the reducible entropy data by using corresponding position adjustment parameters and direction adjustment parameters according to different arrangement sequences, and selecting the arrangement sequence which is most similar to the ideal compression model after adjustment as a corresponding sequence adjustment parameter;
and performing distribution adjustment on the reducible entropy data by using the position adjustment parameter, the direction adjustment parameter and the sequence adjustment parameter to obtain an adjusted entropy reduction model, and compressing the entropy reduction model.
Preferably, the interval division of the data to be compressed includes:
and dividing all data to be detected into a plurality of compression intervals by taking the length of a compression window in lz77 encoding compression as an interval division unit.
Preferably, the average repeatability obtaining method comprises:
and calculating the information entropy of the repeated data with each length in the compression interval, and acquiring the average repeatability of the corresponding compression interval based on the information entropy corresponding to all different lengths and the length of the compression interval.
Preferably, the repartitioning of all data based on the average repeatability of all data of a plurality of consecutive compression intervals includes:
the average repeatability of the first compression interval is noted
Figure 100002_DEST_PATH_IMAGE001
Obtaining the average repeatability of all data of the first compression interval and the second compression interval, and recording the average repeatability as
Figure 161409DEST_PATH_IMAGE002
If, if
Figure 100002_DEST_PATH_IMAGE003
Continuing to calculate the average repeatability of all data in the first, second and third compression intervals
Figure 406577DEST_PATH_IMAGE004
Until it is calculated to
Figure 100002_DEST_PATH_IMAGE005
In the case of a continuous interval of compression,
Figure 906959DEST_PATH_IMAGE006
before, before
Figure 517063DEST_PATH_IMAGE005
A continuous pressAll data of the interval are used as a first data interval;
and then, the average repeatability is calculated from the j +1 th compression interval again until all data intervals of the data to be compressed are obtained.
Preferably, the method for acquiring the distribution characteristics includes:
for any data interval, calculating the distance of any repeated data in the data interval when the repeated data occur each time, calculating the information entropy of the distance, and acquiring the summation result of the corresponding information entropy when all the repeated data occur repeatedly; and calculating the proportion of the data length of the repeated data in the total length of the data interval as the weight of the summation result, taking the obtained product as a characteristic index, taking the characteristic index as the index of a preset value, and taking the obtained index function result as the distribution characteristic.
Preferably, the process of establishing the ideal compression model of the reducible entropy data is as follows:
and performing simulation compression on all data in a data interval in which each piece of reducible entropy data is positioned by taking the length of a compression interval as the length of a sliding compression window, and in the process of simulation compression, when encountering incompressible reducible entropy data, adjusting the position of the next occurrence of the reducible entropy data to ensure that the reducible entropy data is just compressed, traversing the whole data interval to obtain a corresponding ideal compression model.
The embodiment of the invention at least has the following beneficial effects:
firstly, the average repeatability of the data to be compressed is calculated, and all the data are partitioned again by utilizing the average repeatability, so that more repeated data are partitioned in the same data interval as much as possible, and the subsequent compression efficiency is improved; then screening entropy-reducible data in the data intervals, judging whether the data in each data interval has the need of entropy reduction, and screening the data needing entropy reduction; the corresponding position adjustment parameters, direction adjustment parameters and sequence adjustment parameters are obtained by establishing an ideal compression model of the entropy-reducible data, so that the entropy-reducible data is subjected to distribution adjustment to obtain an adjusted entropy-reducible model, repeated data is in the same compression window as much as possible during compression, the entropy-reducible model is compressed, the compression efficiency is improved, the retrieval time is shortened, and the efficient compression of the ERP management system data is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart illustrating steps of a data compression method of an ERP management system according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description of a data compression method for an ERP management system according to the present invention, its specific implementation, structure, features and effects will be given with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the data compression method of the ERP management system provided by the present invention in detail with reference to the accompanying drawings.
Referring to fig. 1, a flowchart illustrating steps of a data compression method of an ERP management system according to an embodiment of the present invention is shown, where the method includes the following steps:
and S001, collecting data to be compressed of the ERP management system.
The data to be compressed are screened and collected by utilizing a database of the ERP management system, and the data to be compressed can be collected through manual selection or automatic system selection.
S002, carrying out interval division on data to be compressed to obtain a plurality of compression intervals, and acquiring corresponding average repeatability according to the information entropy of the repeated data with different lengths in each compression interval; and re-partitioning all the data based on the average repeatability of all the data of the continuous multiple compression intervals to obtain multiple data intervals.
The method comprises the following specific steps:
1. and carrying out interval division on the data to be compressed to obtain a plurality of compression intervals.
And dividing all data to be detected into a plurality of compression intervals by taking the length of a compression window in lz77 encoding compression as an interval division unit.
2. The average repeatability of each compression interval is obtained.
And calculating the information entropy of the repeated data with each length in the compression interval, and acquiring the average repeatability of the corresponding compression interval based on the information entropy corresponding to all different lengths and the length of the compression interval.
Counting the length of the longest repeated data in a compression interval, and recording the length of the longest repeated data in the compression interval as
Figure DEST_PATH_IMAGE007
Then, taking a compression interval as an example, the corresponding average repeatability is calculated:
Figure DEST_PATH_IMAGE009
wherein, the first and the second end of the pipe are connected with each other,
Figure 142211DEST_PATH_IMAGE010
represents a repetition length of data of
Figure 378020DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
Figure 529647DEST_PATH_IMAGE007
The length of the longest repeated data of the current compression interval is obtained;
Figure 310652DEST_PATH_IMAGE012
indicating that the length of the repeated data is
Figure 469101DEST_PATH_IMAGE010
In the case of (1)
Figure DEST_PATH_IMAGE013
The number of different pieces of data is different,
Figure 790492DEST_PATH_IMAGE014
in which
Figure DEST_PATH_IMAGE015
Indicating that the length of the repeated data is
Figure 808344DEST_PATH_IMAGE010
The maximum data amount of all data at that length;
Figure 963513DEST_PATH_IMAGE016
expressed as the length of the repeated data being
Figure 609258DEST_PATH_IMAGE010
In the case of (1)
Figure 672023DEST_PATH_IMAGE013
The probability of a single different piece of data,
Figure DEST_PATH_IMAGE017
to represent
Figure 532663DEST_PATH_IMAGE012
The entropy of the information of (1); a denotes a compression section length.
InformationEntropy is a well-known calculation formula. Each time the repetition degree of the data with the same length is calculated, the whole data needs to be calculated once, and the calculation is carried out
Figure 904739DEST_PATH_IMAGE007
Then, so as to
Figure 788512DEST_PATH_IMAGE018
The average reproducibility was calculated as the denominator.
In a compression interval, the calculation of the repeatability of the data is carried out by using the information entropy, the more the data with different length is repeated,
Figure DEST_PATH_IMAGE019
the larger the part of calculated values is, then the information entropy of all data with different lengths is accumulated, and the average value is obtained to represent the average repeatability C of the data in the window, wherein the larger the average repeatability C is, the more likely that B is any length belonging to B and the corresponding data is repeated.
3. And re-partitioning all the data based on the average repeatability of all the data of the continuous multiple compression intervals to obtain multiple data intervals.
The average repeatability of the first compression interval is recorded
Figure 982864DEST_PATH_IMAGE001
And obtaining the average repeatability of all data in the first compression interval and the second compression interval, and recording the average repeatability as
Figure 884961DEST_PATH_IMAGE002
If, if
Figure 178670DEST_PATH_IMAGE003
Continuing to calculate the average repeatability of all data in the first, second and third compression intervals
Figure 612057DEST_PATH_IMAGE004
Until it is calculated to
Figure 531471DEST_PATH_IMAGE005
In the case of a continuous interval of compression,
Figure 304386DEST_PATH_IMAGE006
before, before
Figure 18264DEST_PATH_IMAGE005
All data of a continuous compression interval is taken as a first data interval
Figure 876630DEST_PATH_IMAGE020
Subscript "
Figure 334156DEST_PATH_IMAGE022
"indicates the first data interval, superscript
Figure DEST_PATH_IMAGE023
Indicates that the data interval is shared
Figure 961578DEST_PATH_IMAGE023
A piece of data; then, the average repeatability is calculated from the j +1 th compression interval again until all data intervals of the data to be compressed are obtained, and the average repeatability is expressed as:
Figure 862669DEST_PATH_IMAGE024
. Wherein the subscript
Figure DEST_PATH_IMAGE025
Denotes the first
Figure 5068DEST_PATH_IMAGE025
The number of the data intervals is one,
Figure 266285DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE027
is the total number of data intervals.
It should be noted that each data areaMay not be equal, and for convenience of description, are unified
Figure 810530DEST_PATH_IMAGE023
To perform the presentation.
S003, acquiring the distribution characteristics of the repeated data in each data interval, and screening out the repeated data corresponding to the distribution characteristics larger than the screening threshold value as reducible entropy data by taking the average value of all the distribution characteristics as the screening threshold value; and establishing an ideal compression model of the reducible entropy data.
The method comprises the following specific steps:
1. and acquiring the distribution characteristics of the repeated data in each data interval.
For any data interval, calculating the distance of any repeated data in the data interval when the repeated data occur each time, calculating the information entropy of the distance, and acquiring the summation result of the corresponding information entropy when all the repeated data occur repeatedly; and calculating the proportion of the data length of the repeated data in the total length of the data interval as the weight of the summation result, wherein the obtained product is a characteristic index, the characteristic index is used as the index of a preset value, and the obtained index function result is the distribution characteristic.
To a first order
Figure 882523DEST_PATH_IMAGE025
Individual interval
Figure 512218DEST_PATH_IMAGE028
To (1) a
Figure DEST_PATH_IMAGE029
Repeating data
Figure 644036DEST_PATH_IMAGE030
For the purpose of example only,
Figure DEST_PATH_IMAGE031
Figure 42787DEST_PATH_IMAGE032
is a first
Figure 269369DEST_PATH_IMAGE025
A total number of all duplicate data within the data interval. Firstly, counting the repeated data
Figure 324044DEST_PATH_IMAGE030
In the first place
Figure 130326DEST_PATH_IMAGE025
All number of repetitions of the interval
Figure DEST_PATH_IMAGE033
And recording the position of each occurrence thereof, e.g. the first
Figure 383584DEST_PATH_IMAGE034
The secondary occurrence position is
Figure DEST_PATH_IMAGE035
Figure 859696DEST_PATH_IMAGE036
And then calculating the duplicate data
Figure 401667DEST_PATH_IMAGE030
Distance between two adjacent repeats
Figure 808378DEST_PATH_IMAGE034
Distance between the occurrence of a repeat and the occurrence of the next repeat
Figure DEST_PATH_IMAGE037
For the purpose of example only,
Figure 260350DEST_PATH_IMAGE038
wherein, in the step (A),
Figure 828734DEST_PATH_IMAGE035
representing duplicate data
Figure 858001DEST_PATH_IMAGE030
In that
Figure 802824DEST_PATH_IMAGE028
To middle
Figure 968357DEST_PATH_IMAGE034
The position of the secondary occurrence is,
Figure DEST_PATH_IMAGE039
representing duplicate data
Figure 786271DEST_PATH_IMAGE030
In that
Figure 489785DEST_PATH_IMAGE028
To middle
Figure 989031DEST_PATH_IMAGE040
The position of the next occurrence.
Then obtaining duplicate data based on distance
Figure 258338DEST_PATH_IMAGE030
In the first place
Figure 184837DEST_PATH_IMAGE025
Distribution characteristics in individual intervals
Figure DEST_PATH_IMAGE041
Figure DEST_PATH_IMAGE043
Wherein, the first and the second end of the pipe are connected with each other,
Figure 313330DEST_PATH_IMAGE044
weights indicating the result of the summation, i.e. duplicate data
Figure 350687DEST_PATH_IMAGE030
The ratio of the data length of (a) to the total length of the data interval;
Figure DEST_PATH_IMAGE045
representing duplicate data
Figure 881025DEST_PATH_IMAGE030
Summing results of all corresponding information entropies when the information entropies repeatedly appear; e is a natural constant, i.e., a preset value in the embodiment of the present invention.
By passing
Figure 990144DEST_PATH_IMAGE045
To the first
Figure 464988DEST_PATH_IMAGE025
In the data interval
Figure 306036DEST_PATH_IMAGE029
Duplicate data
Figure 549936DEST_PATH_IMAGE030
The overall difference situation of the positions of two adjacent occurrences is quantified if the data is repeated
Figure 818237DEST_PATH_IMAGE030
In that
Figure 780377DEST_PATH_IMAGE028
The more regular the position appears in
Figure 362799DEST_PATH_IMAGE045
The smaller the distribution position adjustment of the subsequent entropy reduction function is, the less the adjustment position is, and the less the calculation amount is.
Figure 461205DEST_PATH_IMAGE046
Wherein S represents the number of repetitions of the repeated data,
Figure DEST_PATH_IMAGE047
indicating a distance of
Figure 962725DEST_PATH_IMAGE037
Probability of occurrence.
Repeating data
Figure 162893DEST_PATH_IMAGE030
Is in the total length of the data interval
Figure 673640DEST_PATH_IMAGE048
In which
Figure DEST_PATH_IMAGE049
Representing duplicate data
Figure 705181DEST_PATH_IMAGE030
The length of (a) is greater than (b),
Figure 564553DEST_PATH_IMAGE033
indicates the number of repetitions of the repeated data,
Figure 252017DEST_PATH_IMAGE023
representing duplicate data
Figure 222247DEST_PATH_IMAGE030
At the position of
Figure 45978DEST_PATH_IMAGE025
The total length of each data interval. Utilizing duplicate data
Figure 76251DEST_PATH_IMAGE030
Has a data length of
Figure 985432DEST_PATH_IMAGE028
In (b) ratio
Figure 962615DEST_PATH_IMAGE050
To calculate the weight, the ratio thereof
Figure 906432DEST_PATH_IMAGE050
The larger the description of duplicate data
Figure 107606DEST_PATH_IMAGE030
In the first place
Figure 238504DEST_PATH_IMAGE025
Within a data interval, the length of the statement or the repetition degree has a considerable proportion, and the statement is compressed as much as possible by adjusting the distribution position of the statement and then is compared with the data interval
Figure 81695DEST_PATH_IMAGE025
The compression rate of the data of each partition contributes more, so the characteristic quantization of the distance is performed by using the occupation ratio as a weight.
By using the first
Figure 207914DEST_PATH_IMAGE025
Duplicate data of individual data intervals
Figure 596301DEST_PATH_IMAGE030
Distribution characteristics of
Figure 463763DEST_PATH_IMAGE041
Judgment of
Figure 330219DEST_PATH_IMAGE025
Whether the data interval has the need of calculating the entropy reduction function to adjust the distribution position of the entropy reduction model or not is judged.
Figure 232316DEST_PATH_IMAGE030
In the first place
Figure 717569DEST_PATH_IMAGE025
The more times of repeated occurrence in the data interval, the longer the data length during the repeated occurrence, and the more regular the distribution position, the more the data interval is compressed in an ideal state
Figure 337906DEST_PATH_IMAGE025
The greater the compression rate contribution of the individual data intervals, the greater the necessity for the calculation of the entropy-reducing function thereof.
In the above manner to
Figure 742474DEST_PATH_IMAGE025
And calculating the distribution characteristics of all the repeated data in each interval to obtain the distribution characteristics of all the repeated data.
2. The reducible entropy data is filtered.
And taking the average value of all the distribution characteristics as a screening threshold value, and screening out the repeated data corresponding to the distribution characteristics larger than the screening threshold value as the reducible entropy data.
Comparing the distribution characteristics corresponding to each repeated data with the screening threshold value, reserving the repeated data corresponding to the distribution characteristics larger than the screening threshold value, considering that the subsequent calculated amount is too large when the repeated data corresponding to the distribution characteristics lower than the screening threshold value are processed by using the entropy reduction model, and comparing the calculated amount with that of the repeated data corresponding to the second distribution characteristic lower than the screening threshold value
Figure 30235DEST_PATH_IMAGE025
The compression ratio contribution of each data interval is not paid, and therefore, it is considered that it is not necessary to perform the entropy reduction model processing.
To this end, screening
Figure 494846DEST_PATH_IMAGE025
All the repeated data of each data interval necessary to participate in the entropy reduction model processing.
3. And establishing an ideal compression model of the reducible entropy data.
And performing simulated compression on all data in a data interval in which each piece of reducible entropy data is positioned by taking the length of a compression interval as the length of a sliding compression window, and in the process of simulated compression, when encountering incompressible reducible entropy data, adjusting the next occurrence position of the repeated data to ensure that the reducible entropy data is just compressed, and traversing the whole data interval to obtain a corresponding ideal compression model.
Screening out
Figure 336900DEST_PATH_IMAGE025
All repetition numbers of individual data intervals necessary to participate in the entropy-reduction model processingAfter the data is used as the reducible entropy data, the establishment of a reduced entropy model is carried out on each different reducible entropy data by utilizing the partitioned data and the length of a sliding compression window
Figure DEST_PATH_IMAGE051
Establishing an ideal compression model, so that when the ideal compression model is compressed by using a sliding compression window, all reducible entropy data can be compressed and then combined with the actual second compression window
Figure 686104DEST_PATH_IMAGE025
Establishing an entropy-reducing function of the entropy-reducing data according to the data distribution condition of each data interval to adjust the distribution position of the entropy-reducing data, so that under the action of the entropy-reducing function
Figure 562793DEST_PATH_IMAGE025
All data distributions for each data interval are closest to the ideal compression model.
First using the first
Figure 198305DEST_PATH_IMAGE025
The arrangement mode of the data of each data interval, the length of a sliding compression window and the reducible entropy data establish an ideal compression model, the ideal compression model is that the maximum reducible entropy data is always compressed in each sliding compression process of the sliding dictionary, and the specific establishment mode is as follows:
firstly to the first
Figure 793234DEST_PATH_IMAGE025
The length of all data utilization compression interval in the data interval
Figure 805184DEST_PATH_IMAGE051
And performing analog compression as the length of the sliding compression window, and in the process of the analog compression, when encountering incompressible entropy-reducing data, adjusting the position of the next occurrence of the entropy-reducing data to ensure that the entropy-reducing data is just compressed.
For example, an existing data areaIn a certain section of data
Figure 739642DEST_PATH_IMAGE052
In which
Figure DEST_PATH_IMAGE053
Figure 873951DEST_PATH_IMAGE054
To be denotable, the entropy data, when compressed,
Figure 441330DEST_PATH_IMAGE054
can not be compressed, and then once adjusted
Figure 506238DEST_PATH_IMAGE054
The position of the appearance makes the compression just capable of being compressed, and the adjusted ideal compression model is
Figure DEST_PATH_IMAGE055
At this time
Figure 983618DEST_PATH_IMAGE054
Just as it can be compressed.
In the above manner to
Figure 210200DEST_PATH_IMAGE025
Performing analog compression on all data in the data interval to obtain the first data
Figure 264875DEST_PATH_IMAGE025
Ideal compression model of individual data interval, all data in ideal compression model and
Figure 133473DEST_PATH_IMAGE028
identical, only the arrangement positions are different.
The method comprises the steps of judging whether the repeated data is necessary to participate in an entropy reduction model or not by calculating the distribution characteristics of each repeated data, and adjusting the position of the repeated data with larger distribution characteristics by calculating an entropy reduction model function so that the repeated data are positioned in the same sliding window when the sliding compression window slides as much as possible to improve the compression efficiency.
Step S004, acquiring corresponding position adjustment parameters based on the difference between the arrangement position of the reducible entropy data and the ideal compression model; acquiring corresponding direction adjustment parameters based on the positive and negative of the accumulated value of the difference; and adjusting the reducible entropy data by using the corresponding position adjustment parameters and direction adjustment parameters according to different arrangement sequences, and selecting the arrangement sequence which is most similar to the ideal compression model after adjustment as the corresponding sequence adjustment parameters.
In the process of position adjustment, the position adjustment amount, the adjustment direction and the adjustment sequence need to be determined. Because the screened repeated data are a plurality of data, the overall influence effect is different due to different adjustment sequences, the regulation of the adjustment amount and direction of all screened positions is carried out through the difference value of the positions, and then the self-adaption of the entropy reduction function is carried out according to the similarity of different adjustment sequences.
The method comprises the following specific steps:
1. and acquiring the position adjusting parameters of the reducible entropy data.
For each piece of reducible entropy data
Figure 324414DEST_PATH_IMAGE025
Calculating the average value of the difference value between the arrangement position in the data interval and the arrangement position in the ideal compression model to obtain the second data interval
Figure 987477DEST_PATH_IMAGE025
Individual section
Figure 263869DEST_PATH_IMAGE028
To (1) a
Figure 873842DEST_PATH_IMAGE029
Repeating data
Figure 919289DEST_PATH_IMAGE030
For example, the average position difference value
Figure 18832DEST_PATH_IMAGE056
The calculation of (c) is as follows:
Figure 782520DEST_PATH_IMAGE058
wherein, the first and the second end of the pipe are connected with each other,
Figure 727342DEST_PATH_IMAGE035
is shown as
Figure 44990DEST_PATH_IMAGE025
Within a data interval
Figure 800588DEST_PATH_IMAGE029
Multiple data of
Figure 566418DEST_PATH_IMAGE034
Position of next occurrence
Figure 65664DEST_PATH_IMAGE036
Wherein
Figure 538234DEST_PATH_IMAGE033
Is a first
Figure 464733DEST_PATH_IMAGE029
Repeating data
Figure 452280DEST_PATH_IMAGE030
The number of all occurrences is such that,
Figure DEST_PATH_IMAGE059
is shown as
Figure 817533DEST_PATH_IMAGE025
Inner second of ideal compression model corresponding to each data interval
Figure 957659DEST_PATH_IMAGE029
Multiple data of
Figure 38747DEST_PATH_IMAGE034
The position of the next occurrence.
To a first order
Figure 592220DEST_PATH_IMAGE025
All the screened repeated data and the second data in each data interval
Figure 433268DEST_PATH_IMAGE025
Average position difference value of same occurrence of corresponding repeated data in ideal compression model corresponding to each data interval
Figure 942747DEST_PATH_IMAGE060
As a position adjustment parameter in the function of decreasing entropy.
The average difference value represents the large trend of position difference between all screened repeated data and an ideal compression model, namely, the average difference value is divided into a small part of data and the rest data
Figure 211048DEST_PATH_IMAGE025
The position difference between the screened repeated data in each data interval and the corresponding data in the ideal compression model floats around the average value. Using mean difference value
Figure 907609DEST_PATH_IMAGE060
When the position adjustment is carried out as the position adjustment parameter, the calculation amount is less, and the position adjustment is more accurate.
2. And acquiring direction adjustment parameters of the reducible entropy data.
Because of the first
Figure 286768DEST_PATH_IMAGE029
Multiple data of
Figure 650754DEST_PATH_IMAGE034
Repeating data in the second occurrence and ideal compression model
Figure 293219DEST_PATH_IMAGE034
The position of the secondary occurrence is different, so the calculated difference value
Figure DEST_PATH_IMAGE061
If there is positive or negative, the difference value is accumulated to obtain
Figure 290125DEST_PATH_IMAGE062
Then, it is judged
Figure DEST_PATH_IMAGE063
If the sign of the repeated data is positive, the corresponding occurrence times of the repeated data in most ideal compression models are proved to be in the second place
Figure 128768DEST_PATH_IMAGE025
The position ahead of the occurrence in the data interval is adjusted forward when the entropy reduction function is used for adjusting the data interval, so that the data interval can be closer to an ideal compression model; otherwise, if the number is negative, the corresponding occurrence number of the repeated data in most ideal compression models is proved to be in the second place
Figure 363571DEST_PATH_IMAGE025
After the occurrence in the data interval, the data interval should be adjusted backward to be closer to the ideal compression model when being adjusted by the entropy reduction function.
3. And acquiring sequential adjustment parameters of the reducible entropy data.
Adjusting all the reducible entropy data according to different arrangement sequences by using the position adjustment parameters and the direction adjustment parameters, and then calculating the adjusted second order
Figure 426205DEST_PATH_IMAGE025
And selecting the corresponding adjustment sequence with the highest similarity as a sequence adjustment parameter according to the similarity of the data intervals and the corresponding ideal compression models.
The method specifically comprises the following steps: firstly, randomly determining the adjustment sequence of a group of screened repeated data; then, the first step is carried out according to the position adjustment parameter and the direction adjustment parameter in the sequence
Figure 848090DEST_PATH_IMAGE025
Adjusting the data distribution position of each data interval; calculating the adjusted second in the sequence after the adjustment is completed
Figure 83899DEST_PATH_IMAGE025
The data structure of each data interval, namely the arrangement mode of all the data in the whole is similar to the structure of an ideal compression model. The higher the structural similarity is, the more the adjustment is performed in that order
Figure 907630DEST_PATH_IMAGE025
The data structure of the data interval is closer to an ideal compression model, i.e. the compression efficiency is higher. And selecting the arrangement sequence with the highest structural similarity as a sequence adjustment parameter.
It should be noted that the structural similarity is the probability of the same bit data in the prior art.
And S005, performing distributed adjustment on the entropy-reducible data by using the position adjustment parameter, the direction adjustment parameter and the sequence adjustment parameter to obtain an adjusted entropy-reduction model, and compressing the entropy-reduction model.
The method comprises the following specific steps:
and performing distribution adjustment on the data of each data interval by using the corresponding entropy reduction function of each data interval, specifically performing sequence adjustment on the screened repeated data by combining the position adjustment parameter and the direction adjustment parameter according to the sequence adjustment parameter of each data interval, changing the distribution characteristics of the whole interval, wherein the adjusted data of each data interval in different arrangement modes is the entropy reduction model corresponding to each data interval. And compressing the entropy reduction function of each data interval as a head file and an entropy reduction model of each data interval by using a GZip compression mode.
In summary, the embodiment of the present invention collects data to be compressed of the ERP management system; the method comprises the steps of performing interval division on data to be compressed to obtain a plurality of compression intervals, and obtaining corresponding average repeatability according to the information entropy of repeated data with different lengths in each compression interval; re-partitioning all data based on the average repeatability of all data in a plurality of continuous compression intervals to obtain a plurality of data intervals; acquiring the distribution characteristics of the repeated data in each data interval, taking the average value of all the distribution characteristics as a screening threshold value, and screening out the repeated data corresponding to the distribution characteristics larger than the screening threshold value as the reducible entropy data; establishing an ideal compression model of the reducible entropy data; acquiring corresponding position adjustment parameters based on the difference between the arrangement position of the reducible entropy data and the ideal compression model; acquiring corresponding direction adjustment parameters based on the positive and negative of the accumulated value of the difference; adjusting the reducible entropy data by using corresponding position adjustment parameters and direction adjustment parameters according to different arrangement sequences, and selecting the arrangement sequence which is most similar to the ideal compression model after adjustment as the corresponding sequence adjustment parameters; and carrying out distribution adjustment on the reducible entropy data by using the position adjustment parameter, the direction adjustment parameter and the sequence adjustment parameter to obtain an adjusted entropy reduction model, and compressing the entropy reduction model. The embodiment of the invention can improve the compression efficiency, reduce the retrieval time and realize the high-efficiency compression of the ERP management system data.
It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments. And that specific embodiments have been described above. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts in the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; modifications of the technical solutions described in the foregoing embodiments, or equivalents of some technical features thereof, are not essential to the spirit of the technical solutions of the embodiments of the present application, and are all included in the scope of the present application.

Claims (6)

1. A data compression method of an ERP management system is characterized by comprising the following steps:
collecting data to be compressed of an ERP management system;
the method comprises the steps of performing interval division on data to be compressed to obtain a plurality of compression intervals, and obtaining corresponding average repeatability according to the information entropy of repeated data with different lengths in each compression interval; re-partitioning all data based on the average repeatability of all data of a plurality of continuous compression intervals to obtain a plurality of data intervals;
acquiring the distribution characteristics of the repeated data in each data interval, taking the average value of all the distribution characteristics as a screening threshold value, and screening out the repeated data corresponding to the distribution characteristics larger than the screening threshold value as the reducible entropy data; establishing an ideal compression model of the reducible entropy data;
acquiring corresponding position adjustment parameters based on the difference between the arrangement position of the reducible entropy data and the ideal compression model; acquiring corresponding direction adjustment parameters based on the positive and negative of the accumulated value of the difference; adjusting the reducible entropy data by using corresponding position adjustment parameters and direction adjustment parameters according to different arrangement sequences, and selecting the arrangement sequence which is most similar to the ideal compression model after adjustment as the corresponding sequence adjustment parameters;
and performing distribution adjustment on the reducible entropy data by using the position adjustment parameter, the direction adjustment parameter and the sequence adjustment parameter to obtain an adjusted entropy reduction model, and compressing the entropy reduction model.
2. The data compression method of an ERP management system according to claim 1, wherein the interval division of the data to be compressed includes:
and dividing all data to be detected into a plurality of compression intervals by taking the length of a compression window during lz77 coding compression as an interval division unit.
3. The data compression method for the ERP management system according to claim 1, wherein the average repeatability obtaining method is as follows:
and calculating the information entropy of the repeated data with each length in the compression interval, and acquiring the average repeatability of the corresponding compression interval based on the information entropy corresponding to all different lengths and the length of the compression interval.
4. The data compression method of an ERP management system according to claim 1, wherein the repartitioning all the data based on the average repeatability of all the data of a plurality of consecutive compression intervals comprises:
the average repeatability of the first compression interval is noted
Figure DEST_PATH_IMAGE001
Obtaining the average repeatability of all data of the first compression interval and the second compression interval, and recording the average repeatability as
Figure 870537DEST_PATH_IMAGE002
If, if
Figure DEST_PATH_IMAGE003
Continuing to calculate the average repeatability of all data in the first, second and third compression intervals
Figure 191928DEST_PATH_IMAGE004
Until it is calculated to
Figure DEST_PATH_IMAGE005
In the case of a continuous interval of compression,
Figure 276690DEST_PATH_IMAGE006
before, before
Figure 681126DEST_PATH_IMAGE005
All data of the continuous compression intervals are used as a first data interval;
and then, the average repeatability is calculated from the j +1 th compression interval again until all data intervals of the data to be compressed are obtained.
5. The data compression method for the ERP management system according to claim 1, wherein the method for obtaining the distribution characteristics is:
for any data interval, calculating the distance of any repeated data in the data interval when the repeated data occur each time, calculating the information entropy of the distance, and acquiring the summation result of the corresponding information entropy when all the repeated data occur; and calculating the proportion of the data length of the repeated data in the total length of the data interval as the weight of the summation result, taking the obtained product as a characteristic index, taking the characteristic index as the index of a preset value, and taking the obtained index function result as the distribution characteristic.
6. The data compression method of the ERP management system as claimed in claim 1, wherein the process of establishing the ideal compression model of the reducible entropy data is:
and performing simulation compression on all data in a data interval in which each piece of reducible entropy data is positioned by taking the length of a compression interval as the length of a sliding compression window, and in the process of simulation compression, when encountering incompressible reducible entropy data, adjusting the position of the next occurrence of the reducible entropy data to ensure that the reducible entropy data is just compressed, traversing the whole data interval to obtain a corresponding ideal compression model.
CN202211206424.4A 2022-09-30 2022-09-30 Data compression method of ERP management system Active CN115269940B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211206424.4A CN115269940B (en) 2022-09-30 2022-09-30 Data compression method of ERP management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211206424.4A CN115269940B (en) 2022-09-30 2022-09-30 Data compression method of ERP management system

Publications (2)

Publication Number Publication Date
CN115269940A true CN115269940A (en) 2022-11-01
CN115269940B CN115269940B (en) 2022-12-13

Family

ID=83757927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211206424.4A Active CN115269940B (en) 2022-09-30 2022-09-30 Data compression method of ERP management system

Country Status (1)

Country Link
CN (1) CN115269940B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116828070A (en) * 2023-08-28 2023-09-29 无锡市锡容电力电器有限公司 Intelligent power grid data optimization transmission method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609491A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage oriented area-level data compression method
CN114244373A (en) * 2022-02-24 2022-03-25 麒麟软件有限公司 LZ series compression algorithm coding and decoding speed optimization method
WO2022126902A1 (en) * 2020-12-18 2022-06-23 平安科技(深圳)有限公司 Model compression method and apparatus, electronic device, and medium
CN114956290A (en) * 2022-07-27 2022-08-30 江苏赛沐思环保科技有限公司 LZ 77-coding-based intelligent treatment method for industrial wastewater

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609491A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage oriented area-level data compression method
WO2022126902A1 (en) * 2020-12-18 2022-06-23 平安科技(深圳)有限公司 Model compression method and apparatus, electronic device, and medium
CN114244373A (en) * 2022-02-24 2022-03-25 麒麟软件有限公司 LZ series compression algorithm coding and decoding speed optimization method
CN114956290A (en) * 2022-07-27 2022-08-30 江苏赛沐思环保科技有限公司 LZ 77-coding-based intelligent treatment method for industrial wastewater

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
唐红: "对LZ77压缩数据的不均一纠错编码", 《四川大学学报(工程科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116828070A (en) * 2023-08-28 2023-09-29 无锡市锡容电力电器有限公司 Intelligent power grid data optimization transmission method
CN116828070B (en) * 2023-08-28 2023-11-07 无锡市锡容电力电器有限公司 Intelligent power grid data optimization transmission method

Also Published As

Publication number Publication date
CN115269940B (en) 2022-12-13

Similar Documents

Publication Publication Date Title
CN106658003B (en) A kind of quantization method of the image compression system based on dictionary learning
CN115269940B (en) Data compression method of ERP management system
CN115204754B (en) Heating power supply and demand information management platform based on big data
CN115269526B (en) Method and system for processing semiconductor production data
CN116541828B (en) Intelligent management method for service information data
CN110020721B (en) Target detection deep learning network optimization method based on parameter compression
CN115987294A (en) Multidimensional data processing method of Internet of things
CN115801902B (en) Compression method of network access request data
CN117435145B (en) Digital building information optimized storage method and system
CN115987296B (en) Traffic energy data compression transmission method based on Huffman coding
CN111199740A (en) Unloading method for accelerating automatic voice recognition task based on edge calculation
CN116910285B (en) Intelligent traffic data optimized storage method based on Internet of things
CN111163314A (en) Image compression method and system
Chen Context modeling based on context quantization with application in wavelet image coding
CN116934487B (en) Financial clearing data optimal storage method and system
CN116915873B (en) High-speed elevator operation data rapid transmission method based on Internet of things technology
CN111161363A (en) Image coding model training method and device
CN114924868A (en) Self-adaptive multi-channel distributed deep learning method based on reinforcement learning
CN108981990B (en) Indicator
CN112381206A (en) Deep neural network compression method, system, storage medium and computer equipment
CN107612556B (en) Optimal entropy coding method for L loyd-Max quantizer
CN116505952B (en) Infrared code compression method and device, intelligent equipment and storage medium
CN117459187B (en) High-speed data transmission method based on optical fiber network
Martínez-Alajarín et al. Optimization of the compression parameters of a phonocardiographic telediagnosis system using genetic algorithms
CN112329923B (en) Model compression method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant