CN115964347A - Intelligent storage method for data of market supervision monitoring center - Google Patents

Intelligent storage method for data of market supervision monitoring center Download PDF

Info

Publication number
CN115964347A
CN115964347A CN202310253184.1A CN202310253184A CN115964347A CN 115964347 A CN115964347 A CN 115964347A CN 202310253184 A CN202310253184 A CN 202310253184A CN 115964347 A CN115964347 A CN 115964347A
Authority
CN
China
Prior art keywords
data
actual
actual data
value
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310253184.1A
Other languages
Chinese (zh)
Other versions
CN115964347B (en
Inventor
刘福建
孔建彪
钱瑞娜
徐子栋
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heze Product Inspection And Testing Institute
Original Assignee
Heze Product Inspection And Testing Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heze Product Inspection And Testing Institute filed Critical Heze Product Inspection And Testing Institute
Priority to CN202310253184.1A priority Critical patent/CN115964347B/en
Publication of CN115964347A publication Critical patent/CN115964347A/en
Application granted granted Critical
Publication of CN115964347B publication Critical patent/CN115964347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention relates to the technical field of data compression, in particular to an intelligent storage method for data of a market supervision and monitoring center, which comprises the following steps: obtaining a characteristic coefficient according to the difference between actual data of the market monitoring center and corresponding historical data in a preset sampling time period; obtaining prediction data according to the historical data and the characteristic coefficient, and obtaining an importance index according to the actual data and the prediction data; constructing a data block, and obtaining a distribution index of the data block according to actual data in the data block and a preset characteristic value; obtaining a structural index of the data block according to a difference value between actual data in the data block; obtaining the fitting degree of the data block according to the importance index, the distribution index and the structural index; and classifying the data blocks according to the fitting degree of the data blocks, and compressing and storing the actual data in each class into a data stream by using an LZW algorithm. The invention shortens the time of data compression and improves the efficiency of data compression.

Description

Intelligent storage method for data of market supervision monitoring center
Technical Field
The invention relates to the technical field of data compression, in particular to an intelligent storage method for data of a market supervision and monitoring center.
Background
The market supervision and monitoring center monitors and analyzes market industry data, researches and predicts industry development prospects, analyzes market potential of products, monitors and statistically studies market share and landscape level, intervenes and adjusts the market in time, and avoids economic crisis.
The market supervision and monitoring center needs to perform statistical analysis on national data, so that the data volume is huge, which brings huge burden to the data storage of the detection center. Therefore, it is very important to store the data of the market monitoring center in a lossless compression manner.
In the prior art, an LZW algorithm can be adopted, and an input character string is mapped into a code with a certain length based on a dictionary, so that data compression is realized. However, because the data contains more characters, and meanwhile, different dictionaries are formed by different combinations of the same characters, the data quantity of the dictionaries is increased, the retrieval efficiency in encoding is reduced, and the efficiency in compressing and storing the data is lower.
Disclosure of Invention
In order to solve the technical problem of low efficiency when an LZW algorithm is adopted to compress and store data, the invention aims to provide an intelligent storage method for data of a market monitoring center, and the adopted technical scheme is as follows:
acquiring actual data of a market monitoring center in a preset sampling time period, and obtaining a characteristic coefficient of each actual data according to the difference between each actual data and corresponding historical data;
obtaining prediction data of each actual data according to the historical data and the characteristic coefficient corresponding to each actual data, and obtaining an importance index of each actual data according to the difference between the actual data and the corresponding prediction data;
constructing a preset number of data blocks according to all actual data in a preset sampling time period, and obtaining a distribution index of the data blocks according to each actual data in the data blocks and a preset characteristic value;
obtaining a structural index of the data block according to the distribution disorder degree of actual data in the data block; obtaining the fitting degree of the data block according to the importance index, the distribution index and the structural index of the actual data in the data block;
and classifying the data blocks according to the fitting degree of the data blocks, and compressing and storing the actual data in each class into a data stream by using an LZW algorithm.
Preferably, the method for obtaining the fitting degree of the data block specifically includes:
for any data block, calculating the average value of the importance indexes of all actual data in the data block, and taking the difference value between a preset value and the average value as an adjusting coefficient; and carrying out weighted summation on the characteristic value, the distribution index and the structure index of the data block to obtain a normalized value of a summation result, and calculating a product between an adjusting coefficient and the normalized value and a sum of the mean value to obtain the fitting degree of the data block.
Preferably, the method for acquiring the characteristic coefficient specifically includes:
for any actual data, acquiring a first set number of historical data corresponding to the actual data, recording a difference value between two adjacent historical data as a change gradient of the historical data, and acquiring a data difference between the last historical data and the first historical data corresponding to the actual data; and obtaining a characteristic coefficient of actual data according to the data difference and the change gradient of the historical data.
Preferably, the calculation formula of the characteristic coefficient is specifically:
Figure SMS_1
wherein ,
Figure SMS_2
characteristics representing the l-th actual dataThe coefficients of which are such that,
Figure SMS_3
indicating the corresponding th actual data
Figure SMS_4
The data value of the individual historical data,
Figure SMS_5
a data value representing the first history data corresponding to the ith actual data,
Figure SMS_6
represents the gradient of change of the nth history data,
Figure SMS_7
represents the gradient of change of the (n + 1) th historical data,
Figure SMS_8
representing the number of historical data corresponding to the ith actual data; exp () represents an exponential function with a natural constant e as the base, and epsilon represents a hyper-parameter.
Preferably, the method for obtaining the prediction data specifically comprises:
Figure SMS_9
wherein ,
Figure SMS_12
prediction data representing the ith actual data,
Figure SMS_14
a characteristic coefficient representing the l-th actual data,
Figure SMS_16
indicating the corresponding th actual data
Figure SMS_11
The data value of the individual historical data,
Figure SMS_13
a data value representing the first historical data corresponding to the ith actual data,
Figure SMS_15
is shown as
Figure SMS_17
The gradient of the change of the individual historical data,
Figure SMS_10
the number of history data corresponding to the ith actual data is represented, and e is a natural constant.
Preferably, the method for acquiring the importance index specifically includes:
and taking the normalized value of the absolute value of the difference between the actual data and the corresponding prediction data as the importance index of the actual data.
Preferably, the obtaining of the distribution index of the data block according to each actual data in the data block and the preset characteristic value specifically includes:
and acquiring the fluctuation degree of each actual data in the data block relative to the characteristic value, and taking the fluctuation degree as the distribution index of the data block.
Preferably, the obtaining of the structural index of the data block according to the degree of distribution confusion among the actual data in the data block specifically includes:
Figure SMS_18
wherein ,
Figure SMS_19
indicates the structure index of the ith data block,
Figure SMS_20
a data value representing the a-th actual data in the i-th data block,
Figure SMS_21
a characteristic value representing the ith data block,
Figure SMS_22
indicates the total amount of actual data contained in the ith data block,
Figure SMS_23
representing the frequency of occurrence of the difference between the a-th actual data and the a-1 st actual data in the ith data block, and ln () representing a logarithmic function with a natural constant e as a base.
Preferably, the preset method of the characteristic value specifically includes:
and taking the average value of the mean value and the mode of all actual data in the data block as the characteristic value.
Preferably, the constructing a preset number of data blocks according to all actual data in a preset sampling time period specifically includes:
and converting all the actual data into a preset number of two-dimensional data matrixes, and forming the two-dimensional data matrixes into data blocks.
The embodiment of the invention at least has the following beneficial effects:
the method comprises the steps of obtaining a characteristic coefficient of each actual data according to the difference between the actual data of a market supervision and monitoring center and the corresponding historical data, namely performing difference analysis on the historical data of each actual data, reflecting the data change trend of the historical data by using the characteristic coefficient, obtaining predicted data of each actual data according to the historical data and the characteristic coefficient corresponding to each actual data, performing data prediction on each actual data by combining the change trend of the historical data, obtaining predicted data according with the change trend, further obtaining an importance index of each actual data according to the difference between the actual data and the corresponding predicted data, comparing the difference between the predicted data and the actual data to obtain whether the actual data accords with the change trend of the historical data or not, and reflecting the importance of the actual data by using the importance index; then, dividing the actual data into data blocks, obtaining a distribution index of the data blocks through each actual data in the data blocks and a preset characteristic value, reflecting the fluctuation distribution characteristic of the actual data in the data blocks, obtaining a structural index of the data blocks according to the distribution disorder degree of the actual data in the data blocks, and reflecting the difference disorder characteristic of the actual data in the data blocks; finally, the fitting degree of the data block is obtained according to the importance index of the actual data in the data block, the distribution index and the structure index, the fitting degree is used as a characteristic value of the data block, the data block is further classified, similar data blocks can be obtained, the data blocks with similar sequence are placed in one class, each class of data is compressed by adopting an LZW algorithm, the data volume of a dictionary can be reduced, the data compression time is shortened on the premise that the compression ratio is not lost, and the data compression efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a method for intelligently storing data in a market monitoring center according to the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined purpose, the following detailed description, the structure, the features and the effects of an intelligent data storage method for a market monitoring center according to the present invention are provided with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the intelligent storage method for the data of the market supervision and monitoring center in detail with reference to the accompanying drawings.
Example (b):
the specific scenes aimed by the invention are as follows: the monitoring center has huge data volume, which brings huge pressure to the data storage of the monitoring center to a certain extent, and meanwhile, the accuracy of the stored data needs to be ensured, so that the lossless compression of the data is very important. According to the invention, by analyzing the structural combination among the data, the data intervals with similar ordering are placed into one class through reordering, and each class of data is compressed by adopting an LZW algorithm, so that the time of data compression is shortened and the efficiency of data compression is improved on the premise of not losing the compression ratio.
Referring to fig. 1, a flowchart of a method for intelligently storing data of a market monitoring center according to an embodiment of the present invention is shown, where the method includes the following steps:
the method comprises the steps of firstly, acquiring actual data of a market monitoring center in a preset sampling time period, and obtaining a characteristic coefficient of each actual data according to the difference between each actual data and corresponding historical data.
Firstly, actual data of a market supervision monitoring center within a preset sampling time period needs to be acquired, and main sources of the actual data include: statistical data of a statistical bureau, relevant departments such as customs, industrial and commercial tax affairs, market monitoring data, instant research and arrangement data and the like. Integrating data of various channels, and performing preprocessing operations such as removing invalid data and interference data, wherein the data comprises characters or numerical values and the like. The characters of the obtained data are converted into 8-bit binary values through ASCII codes, and different types of data are converted into uniform binary codes.
The range of the ASCII coded data is 0-127, the corresponding binary coded data is 00000000-01111111, and the first digit value of all data in the coded data range is 0, so that the first digit value of 0 can be removed, at the moment, every 7-digit binary data forms a data value, and all the data values form a data string according to the time sequence.
Further, the data string is segmented, every 8 bits are divided into a new data value, the new data value is converted into decimal data, and finally, the number 0 of the insufficient 8 bits is used for completing the data string. By analyzing the distribution characteristics of data codes and carrying out 0 removing operation, certain calculation amount can be reduced to a certain extent.
In other embodiments, the different types of data are converted into binary codes and then converted into decimal data, and further actual data of the market monitoring center in the preset sampling time period are obtained.
In this embodiment, the preset sampling time period is set to be one week, and by collecting data of every minute of each day in one week and recording the data as actual data, the implementer can set the time according to the specific implementation scenario.
The LZW algorithm, i.e., the string table compression algorithm, achieves compression by creating a string table, i.e., a dictionary, that represents longer strings with shorter codes. LZW algorithms efficiently exploit character frequency of occurrence redundancy for compression, and dictionaries are adaptively generated, but generally do not efficiently exploit position redundancy.
The larger the amount of data is, the larger the number of character combinations included, the larger the amount of data of the dictionary becomes, and the larger the amount of data of the dictionary becomes, the longer the dictionary is generated, and the longer the search time is when encoding is performed, because the same character combinations generate different character combinations due to different position orders. Therefore, the embodiment of the invention carries out the block processing on the data, and places the approximate data blocks at the close positions to reduce the data amount in the dictionary and reduce the retrieval time so as to improve the efficiency of data compression.
On the basis, the obtained actual data are analyzed, and because the actual data of the market monitoring center are time flow data, normal data conform to a certain variation trend under normal conditions, namely the overall variation trend of the actual data is similar in a certain time period, and historical data has a larger reference basis for predicting the variation trend of future data. Therefore, compared with future data with a large change trend of the historical data, the probability that the future data belongs to important data is higher, that is, when the difference between the future data and the actual data is obtained according to the change trend of the historical data, it is shown that the actual data and the historical data have different change trends, data abnormality and the like may exist, and the importance probability is higher.
Analyzing the variation trend between each actual data and the historical data, namely obtaining the characteristic coefficient of each actual data according to the difference between each actual data and the corresponding historical data, specifically, for any actual data, obtaining a first set number of historical data corresponding to the actual data, obtaining the data difference between the last historical data and the first historical data corresponding to the actual data, and marking the difference between two adjacent historical data as the variation gradient of the historical data; and obtaining a characteristic coefficient of actual data according to the data difference and the change gradient of the historical data.
In this embodiment, the first set number is 20, and the historical data corresponding to the actual data refers to 20 data before the actual data, and the implementer may set the historical data according to a specific implementation scenario. That is, each actual data except the first actual data has corresponding history data, and when the number of history data corresponding to the actual data is less than 20, all data before the actual data is recorded as the history data of the actual data.
The calculation formula of the characteristic coefficient of the actual data is specifically as follows:
Figure SMS_24
wherein ,
Figure SMS_25
a characteristic coefficient representing the l-th actual data,
Figure SMS_26
indicates the first actual data
Figure SMS_27
The data value of the historical data, namely the last historical data corresponding to the ith actual data,
Figure SMS_28
a data value representing the first historical data corresponding to the ith actual data,
Figure SMS_29
represents the gradient of change of the nth history data,
Figure SMS_30
represents the gradient of the variation of the (n + 1) th history data,
Figure SMS_31
the number of historical data corresponding to the ith actual data is represented, namely the first set number; exp () represents an exponential function with a natural constant e as a base, and epsilon represents a hyper-parameter, and takes a value of 1 in this embodiment in order to prevent a numerator or denominator from taking a value of 0.
Figure SMS_32
For the gradient of change of the nth history data, in particular
Figure SMS_33
Figure SMS_34
A data value representing the nth history data,
Figure SMS_35
a data value representing the (n-1) th history data. The change gradient reflects the data difference between two adjacent historical data, and the larger the change gradient, the larger the data change between the current historical data and the previous historical data is.
Figure SMS_36
The difference between the change gradients of the two adjacent historical data is represented, and the difference of the data change degrees of the two adjacent historical data is reflected. According to
Figure SMS_37
Obtaining the weight corresponding to the gradient of change of the nth history data, i.e.
Figure SMS_38
The smaller the value of (a), the closer the data change degree of the two adjacent historical data is, the closer the data change trend is to the unity, and further the larger the weight corresponding to the change gradient of the nth historical data is, that is, the more the exponential function is used for carrying out negative correlation mapping on the nth historical data.
The degree of data change of the historical data is obtained from the local analysis by weighted summation of the change gradients of the historical data.
Figure SMS_39
Represent
Figure SMS_40
I.e. the accumulated value representing the gradient of change in the historical data, using
Figure SMS_41
The data change degree of the historical data is obtained through overall analysis, the change trend condition of the historical data of the actual data can be obtained by calculating the proportion of the overall data change degree to the local data change degree, the value of the proportion is greater than or equal to 1, and the value of the characteristic coefficient can be taken as a value 0 by subtracting a constant 1. Specifically, when the change trend of the historical data is relatively uniform, the change gradients of the historical data are all equal, the difference between the change gradients of the historical data is 0, and the value of the characteristic coefficient of the obtained actual data is 0.
The characteristic coefficients reflect the variation trend of the corresponding historical data of the actual data, when the values of the characteristic coefficients are small, the difference between the variation gradients of the two adjacent historical data is small, the data variation degree between the historical data is relatively close, and the variation trend of the corresponding historical data of the actual data is relatively uniform.
And step two, obtaining the prediction data of each actual data according to the historical data and the characteristic coefficient corresponding to each actual data, and obtaining the importance index of each actual data according to the difference between the actual data and the corresponding prediction data.
The market monitoring center has the advantages that the acquisition source of the data is fixed, so that the actual data has certain stability under normal conditions, when the change trend of the historical data is uniform, the difference between the actual data is possibly small, the reference of the historical data is large, and the prediction data of each actual data can be obtained through the historical data and the change trend of the historical data.
Based on the above, the prediction data of each actual data is obtained according to each actual data, the corresponding historical data and the characteristic coefficient, and is expressed by a formula:
Figure SMS_42
wherein ,
Figure SMS_45
prediction data representing the ith actual data,
Figure SMS_47
the characteristic coefficient representing the l-th actual data,
Figure SMS_49
indicating the corresponding th actual data
Figure SMS_44
The data value of the individual pieces of historical data,
Figure SMS_46
a data value representing the first historical data corresponding to the ith actual data,
Figure SMS_48
is shown as
Figure SMS_50
The gradient of the change of the individual historical data,
Figure SMS_43
the number of history data corresponding to the ith actual data is represented, and e is a natural constant.
Figure SMS_51
The average of the degree of data change of the historical data. When the value of the characteristic coefficient is closer to 0, the more uniform the data change degree in the historical data is, the more referential the change trend of the historical data is, and the data value of the predicted data is obtained by using the integral change quantity of the historical data. Namely use of
Figure SMS_52
As the amount of change of the prediction data is dominant.
When the value of the characteristic coefficient is larger, the larger the chaos of the data change degree in the historical data is, the more the change trend of the historical data has no reference meaning, and the data value of the predicted data is obtained by using the data change of the historical data closest to the actual data, namely, by using the data change of the historical data closest to the actual data
Figure SMS_53
As the amount of change in the prediction data.
Figure SMS_54
Indicating the corresponding th actual data
Figure SMS_55
The data value of the historical data, that is, the last historical data corresponding to the ith actual data. It should be noted that the distance between the last historical data corresponding to the actual data and the actual data is the closest, that is, the two are in an adjacent relationship, and it can also be characterized that the last historical data corresponding to the ith actual data is the (l-1) th actual data, that is, each actual data may be the historical data of one or some actual data.
The predicted data of the actual data is obtained based on the variation trend of the historical data and the degree of variation of the data, and when the difference between the data value of the actual data and the data value of the predicted data is smaller, the variation trends of the actual data and the historical data are approximately consistent. When the difference between the data value of the actual data and the data value of the predicted data is larger, it is described that the trend of the actual data is less approximate to the trend of the history data.
Based on this, the importance index of each actual data is obtained according to the difference between the actual data and the corresponding predicted data, specifically, the normalized value of the absolute value of the difference between the actual data and the corresponding predicted data is used as the importance index of the actual data, and is expressed as:
Figure SMS_56
Figure SMS_57
indicates the importance index of the ith actual data,
Figure SMS_58
a data value representing the ith actual data,
Figure SMS_59
predicted data representing the l-th actual data, sigmoid () is a normalization function.
Figure SMS_60
The data difference condition between the ith actual data and the corresponding prediction data is reflected, when the value of the actual data is larger, the situation that the ith actual data is less suitable for the change trend of the historical data is described, the situation that data abnormity and the like possibly occur in the ith actual data is further described, and when the corresponding importance index is larger, the actual data is more important. The importance index reflects the importance of the actual data, and the larger the value of the importance index is, the more important the actual data is.
And step three, constructing a preset number of data blocks according to all actual data in a preset sampling time period, and obtaining the distribution index of the data blocks according to each actual data in the data blocks and a preset characteristic value.
First, it should be noted that the LZW algorithm sequentially transmits data to the compression module, outputs a label of a character if the character exists in the dictionary, and re-marks the character if the character does not exist, which takes a lot of time to search the dictionary during the compression process, thereby reducing the compression efficiency. The embodiment of the invention carries out block processing on actual data, places relatively close data blocks at close positions, wherein the approximation specifically comprises data proportion approximation, data distribution approximation and data structure approximation, and reduces the length of a dictionary generated by the data during compression, reduces the compression time of the data and improves the compression efficiency by limiting the compression mode of the data.
Based on this, a preset number of data blocks are constructed according to all actual data in a preset sampling time period, specifically, data values of all the actual data are converted into two-dimensional data matrixes, the two-dimensional data matrixes form the data blocks, in addition, the size of each two-dimensional data matrix is the same, an implementer needs to set the size of each two-dimensional data matrix according to a specific implementation scene, and when the size of each two-dimensional data matrix is fixed, the total number of the data blocks, namely the preset number, can be obtained according to the total number of the actual data. In the present embodiment, the amount of data that can be accommodated by the two-dimensional data matrix is set to
Figure SMS_61
And each two-dimensional data matrix is free from incomplete data, a is a self-defined parameter, the value of the parameter depends on the performance of compression equipment, and the default value is 100.
Then, a characteristic value corresponding to each data block needs to be preset, and the characteristic value is used as a representative value of all actual data in the data block, and reflects the overall balance condition of all data in the data block. In the present embodiment, the feature value is set as the average of the mode and the mean of all the actual data in the data block.
Further, a distribution index of the data block is obtained according to each actual data in the data block and a preset characteristic value, specifically, a fluctuation degree of each actual data in the data block relative to the characteristic value is obtained, the fluctuation degree is used as the distribution index of the data block, and the distribution index is expressed by a formula as follows:
Figure SMS_62
wherein ,
Figure SMS_63
indicates the distribution index of the ith data block,
Figure SMS_64
a data value representing the a-th actual data in the i-th data block,
Figure SMS_65
a characteristic value representing the ith data block,
Figure SMS_66
representing the total amount of actual data contained in the ith data block.
Figure SMS_67
The method reflects the integral equilibrium trend of data distribution in the ith data block, and further simulates the calculation method of the standard deviation of the data in the data block
Figure SMS_68
Comparing with the average value of all data in the data block, and further obtaining the fluctuation degree of each actual data in the data block relative to the characteristic value.
When the fluctuation degree of all the actual data in the data block is large, the value of the corresponding distribution index is large, and the data distribution in the data block is discrete. When the fluctuation degree of all the actual data in the data block is smaller, the smaller the value of the corresponding distribution index is, which indicates that the data distribution in the data block is more uniform.
Step four, obtaining the structural indexes of the data blocks according to the distribution disorder degree of actual data in the data blocks; and obtaining the fitting degree of the data block according to the importance index of the actual data in the data block, the distribution index and the structure index.
Firstly, it should be noted that there is a structural order between the obtained data and the data, and the higher the structural approximation degree of the data block is, when the data block is re-ordered and compressed, the data amount of the dictionary can be greatly reduced, and the compression efficiency can be improved. When a dictionary is built, character combinations in a previous data block are stored in the dictionary, if the same character combinations exist in a subsequent data block, only whether the character combinations are recorded in the dictionary needs to be considered, and whether the character combinations exist before a current character does not need to be considered, so that combinations formed by data characters in the data block can be stored in the dictionary for subsequent use if similar characters exist in other data blocks, and the compression efficiency of data can be improved on the premise of ensuring the compression rate of the data.
Then, analyzing the data chaos degree in the data block, respectively calculating the difference between two adjacent data in the data block, and further counting the number proportion of each difference, when the structure distribution between the actual data in the data block is similar, the difference between the actual data is also similar. Obtaining a structural index of the data block according to the distribution disorder degree among actual data in the data block, wherein the structural index is expressed by a formula as follows:
Figure SMS_69
wherein ,
Figure SMS_70
indicates the structure index of the ith data block,
Figure SMS_71
a data value representing the a-th actual data in the i-th data block,
Figure SMS_72
a characteristic value representing the ith data block,
Figure SMS_73
indicates the total amount of actual data contained in the ith data block,
Figure SMS_74
denotes the frequency of occurrence of the difference between the a-th actual data and the a-1 st actual data in the i-th data block, and ln () denotes a logarithmic function with a natural constant e as a base.
By calculating the distribution entropy of the difference between two adjacent actual data in the data block
Figure SMS_75
The method reflects the chaotic condition of difference distribution between two adjacent actual data in the data block, and further reflects the structural relation of the actual data in the data block. As the distribution of differences between actual data in a data block becomes more similar,
Figure SMS_76
the smaller the value of (A), the more uniform the actual data distribution in the data block. When the distribution of differences between actual data in a data block is more dissimilar,
Figure SMS_77
the larger the value of (A), the more disordered the actual data distribution in the data block is.
Meanwhile, in order to avoid different data blocks from having the same data distribution situation, the method adopts
Figure SMS_78
And correcting the distribution entropy. For two data blocks, the more similar the composition between the data is, the more similar the distribution of the internal data structure is, the closer the structure indexes of the two data blocks are. The structural index reflects the distribution of data in the data block.
And finally, representing the characteristic information of the actual data in the data block by combining the fluctuation degree and the structure distribution condition of the actual data in the data block and the importance distribution of the actual data. Namely, the fitting degree of the data block is obtained according to the importance index, the distribution index and the structure index of the actual data in the data block.
Specifically, for any data block, the average value of the importance indexes of all actual data in the data block is calculated, and the difference value between a preset value and the average value is used as an adjusting coefficient. And carrying out weighted summation on the characteristic value, the distribution index and the structure index of the data block to obtain a normalized value of a summation result, and calculating a product between an adjusting coefficient and the normalized value and a sum of the average value to obtain the fitting degree of the data block.
The calculation formula of the fitting degree is specifically as follows:
Figure SMS_79
wherein ,
Figure SMS_81
indicating the degree of fit of the ith data block,
Figure SMS_85
represents the average of the importance indicators of all the actual data in the ith data block,
Figure SMS_87
a characteristic value representing the ith data block,
Figure SMS_82
indicates the distribution index of the ith data block,
Figure SMS_84
and sigmoid () is a normalization function, which represents a structural index of the ith data block.
Figure SMS_86
Figure SMS_88
And
Figure SMS_80
for the weight, the values in this embodiment are 0.3, and 0.4, respectively, and the implementer can set the values according to the specific implementation scenario.
Figure SMS_83
Which represents the adjustment coefficient, the value of the preset value is 1 in this embodiment.
Figure SMS_89
Reflecting the size of the importance of the data within the data block,
Figure SMS_90
the larger the value of (a) is, the greater the importance of the actual data in the data block is, the less the reordering operation of the data block is required to be performed on the data, that is, the less the operation on the more important data is required to be performed, so as to avoid the loss of the important data, and further, the less the confusion degree and the structural distribution condition of the actual data in the data block are required to be concerned.
Figure SMS_91
The smaller the value of (A), the smaller the importance of the actual data in the data block, and the adjustment coefficient
Figure SMS_92
The larger the value of (b) is, the more attention needs to be paid to the fluctuation degree of actual data inside the data block and the disorder of the difference.
And taking the fitting degree as a characteristic value of the data block, wherein the characteristic value of the data block is mainly an importance index of actual data inside the data block when the data inside the data block is more important, and the characteristic value of the data block is mainly the fluctuation degree and the confusion condition of difference of the actual data inside the data block when the data inside the data block is less important.
And step five, classifying the data blocks according to the fitting degree of the data blocks, and compressing and storing the actual data in each class into a data stream according to an LZW algorithm.
Specifically, the closer the fitting degree of the data blocks is taken, the greater the probability that there is a duplicate dictionary between the data blocks. That is, when the representation values of the data blocks are relatively close, it is indicated that the importance inside the data blocks is relatively close, or the degree of confusion and the structural distribution of the actual data inside the data blocks are relatively similar. Based on this, the data blocks are arranged according to the value size of the fitting degree according to a set order, in this embodiment, the set order is from large to small, and an implementer can set the data blocks according to a specific implementation scenario.
Furthermore, the arranged data blocks are classified according to the fitting degree to obtain a plurality of categories, each category comprises at least two data blocks, an implementer can select a proper clustering algorithm to process according to a specific implementation scene, and the importance of the data blocks in each category is relatively close or the data distribution is relatively similar. The actual data of the individual data blocks in each category can then be combined into a data stream.
In this embodiment, the two-dimensional data matrix corresponding to the data block in each category is converted into a one-dimensional data sequence, and the one-dimensional data sequence corresponding to each data block is arranged according to a set sequence to form a data stream corresponding to the category, where the set sequence is a sequence of fitting degrees of the data blocks from large to small. Further, the LZW algorithm is applied to the data stream corresponding to each category to compress the data stream.
It should be noted that each category corresponds to one data stream and one dictionary. The LZW algorithm is a well-known technique, and is only briefly introduced here, and for any data stream, the first data P and the second data C in the data stream are obtained, and the specific compression rule is as follows:
(1) If the data P + data C are in the dictionary, P = P + C, no output is carried out, and the step is repeated;
(2) If the data P + data C is not in the dictionary, P = C, that is, the dictionary P + C outputs a corresponding label of the current data P in the dictionary.
And finally, after the compressed data corresponding to each category is obtained, storing the data according to the requirements of the market supervision and monitoring center. When data needs to be extracted and decompressed, an instruction for extracting the data is sent to the storage device, and the storage unit extracts the compressed data and the dictionary corresponding to the data after receiving the instruction.
In this embodiment, the decompression method specifically includes: the dictionary is used to decompress the compressed data, which is decompression by LZW algorithm, which is a well-known technique and will not be described in detail herein. And restoring and sequencing the data obtained by decompression according to the arrangement sequence of the data before compression, and converting the data from decimal system to binary system. And then the data is segmented by 7 bits, and 0 is supplemented to the head bit of each data.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; the modifications or substitutions do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application, and are included in the protection scope of the present application.

Claims (10)

1. An intelligent storage method for market supervision and monitoring center data is characterized by comprising the following steps:
acquiring actual data of a market monitoring center in a preset sampling time period, and obtaining a characteristic coefficient of each actual data according to the difference between each actual data and corresponding historical data;
obtaining prediction data of each actual data according to the historical data and the characteristic coefficient corresponding to each actual data, and obtaining an importance index of each actual data according to the difference between the actual data and the corresponding prediction data;
constructing a preset number of data blocks according to all actual data in a preset sampling time period, and obtaining a distribution index of the data blocks according to each actual data in the data blocks and a preset characteristic value;
obtaining a structural index of the data block according to the distribution disorder degree of actual data in the data block; obtaining the fitting degree of the data block according to the importance index, the distribution index and the structural index of the actual data in the data block;
and classifying the data blocks according to the fitting degree of the data blocks, and compressing and storing the actual data in each class into a data stream by using an LZW algorithm.
2. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for obtaining the fitting degree of the data block specifically comprises the following steps:
for any data block, calculating the average value of the importance indexes of all actual data in the data block, and taking the difference value between a preset value and the average value as an adjusting coefficient; and carrying out weighted summation on the characteristic value, the distribution index and the structure index of the data block to obtain a normalized value of a summation result, and calculating a product between an adjusting coefficient and the normalized value and a sum of the mean value to obtain the fitting degree of the data block.
3. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for acquiring the characteristic coefficient specifically comprises the following steps:
for any actual data, acquiring a first set number of historical data corresponding to the actual data, recording a difference value between two adjacent historical data as a change gradient of the historical data, and acquiring a data difference between the last historical data and the first historical data corresponding to the actual data; and obtaining the characteristic coefficient of the actual data according to the data difference and the change gradient of the historical data.
4. The intelligent storage method for the data of the market supervision and monitoring center according to claim 3, wherein the calculation formula of the characteristic coefficient is specifically as follows:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
characteristic coefficient representing the actual data of the i th |, is determined>
Figure QLYQS_3
Indicates the th or th corresponding actual data>
Figure QLYQS_4
A data value of the historical data->
Figure QLYQS_5
Representing the first history corresponding to the ith actual dataThe data value of the data is greater or less>
Figure QLYQS_6
A change gradient representing an nth history data>
Figure QLYQS_7
Represents the gradient of change of the (n + 1) th history data>
Figure QLYQS_8
Representing the quantity of historical data corresponding to the ith actual data; exp () represents an exponential function with a natural constant e as the base, and epsilon represents a hyper-parameter.
5. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for obtaining the forecast data specifically comprises:
Figure QLYQS_9
wherein ,
Figure QLYQS_11
prediction data representing the ith actual data, based on the predicted data, and based on the predicted data>
Figure QLYQS_14
Characteristic coefficient representing the actual data of the i th |, is determined>
Figure QLYQS_16
Represents the th ^ corresponding actual data>
Figure QLYQS_12
A data value of historical data +>
Figure QLYQS_13
A data value representing a first historical data corresponding to the/th actual data->
Figure QLYQS_15
Indicates the fifth->
Figure QLYQS_17
A gradient of change in individual historical data->
Figure QLYQS_10
The number of history data corresponding to the ith actual data is represented, and e is a natural constant.
6. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for obtaining the importance index specifically comprises the following steps:
and taking the normalized value of the absolute value of the difference between the actual data and the corresponding prediction data as the importance index of the actual data.
7. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the obtaining of the distribution index of the data block according to each actual data and the preset characteristic value in the data block specifically comprises:
and acquiring the fluctuation degree of each actual data in the data block relative to the characteristic value, and taking the fluctuation degree as the distribution index of the data block.
8. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the obtaining of the structural index of the data block according to the degree of distribution confusion among the actual data in the data block is specifically as follows:
Figure QLYQS_18
wherein ,
Figure QLYQS_19
indicates the structure index of the ith data block, device for combining or screening>
Figure QLYQS_20
A data value representing the a-th actual data in the i-th data block>
Figure QLYQS_21
Represents the characteristic value of the i-th data block, is greater than>
Figure QLYQS_22
Representing the total amount of actual data contained in the i-th data block, based on the data block size and the number of data blocks in the block size>
Figure QLYQS_23
Denotes the frequency of occurrence of the difference between the a-th actual data and the a-1 st actual data in the i-th data block, and ln () denotes a logarithmic function with a natural constant e as a base.
9. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the preset method of the characteristic value is specifically as follows:
and taking the average value of the mean value and the mode of all actual data in the data block as the characteristic value.
10. The method for intelligently storing the data of the market supervision and monitoring center according to claim 1, wherein the constructing of the data blocks of the preset number according to all the actual data in the preset sampling time period specifically comprises:
and converting all the actual data into a preset number of two-dimensional data matrixes, and forming the two-dimensional data matrixes into data blocks.
CN202310253184.1A 2023-03-16 2023-03-16 Intelligent storage method for data of market supervision and monitoring center Active CN115964347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310253184.1A CN115964347B (en) 2023-03-16 2023-03-16 Intelligent storage method for data of market supervision and monitoring center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310253184.1A CN115964347B (en) 2023-03-16 2023-03-16 Intelligent storage method for data of market supervision and monitoring center

Publications (2)

Publication Number Publication Date
CN115964347A true CN115964347A (en) 2023-04-14
CN115964347B CN115964347B (en) 2023-05-16

Family

ID=85894731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310253184.1A Active CN115964347B (en) 2023-03-16 2023-03-16 Intelligent storage method for data of market supervision and monitoring center

Country Status (1)

Country Link
CN (1) CN115964347B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185306A (en) * 2023-04-24 2023-05-30 山东爱福地生物股份有限公司 Sewage treatment system data storage method using potamogeton crispus

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201413877D0 (en) * 2014-08-05 2014-09-17 Illumina Cambridge Ltd Methods and systems for data analysis and compression
CN105335384A (en) * 2014-06-30 2016-02-17 中航商用航空发动机有限责任公司 Storage method for monitoring data, reproduction method for monitoring data and devices
CN105469601A (en) * 2015-12-09 2016-04-06 浙江工业大学 A road traffic space data compression method based on LZW coding
CN105788261A (en) * 2016-04-15 2016-07-20 浙江工业大学 Road traffic space data compression method based on PCA and LZW coding
US20170139947A1 (en) * 2015-11-16 2017-05-18 International Business Machines Corporation Columnar database compression
CN109274742A (en) * 2018-09-27 2019-01-25 北京工业大学 A kind of internet of things data acquisition and supervisor control
US20190268017A1 (en) * 2019-05-08 2019-08-29 Vinodh Gopal Self-checking compression
US20200091930A1 (en) * 2018-09-14 2020-03-19 Hewlett Packard Enterprise Development Lp Floating point data set compression
CN112541256A (en) * 2020-12-01 2021-03-23 中国石油大学(华东) Deep learning dimensionality reduction reconstruction-based strong heterogeneous reservoir history fitting method
CN112819189A (en) * 2019-11-15 2021-05-18 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Wind power output prediction method based on historical predicted value
CN114155048A (en) * 2021-12-29 2022-03-08 中国建设银行股份有限公司 Method and device for predicting associated business, electronic equipment and storage medium
CN114881101A (en) * 2022-03-21 2022-08-09 武汉大学 Power system typical scene associated feature selection method based on bionic search
CN115128978A (en) * 2022-06-17 2022-09-30 淮阴工学院 Internet of things environment big data detection and intelligent monitoring system
CN115599757A (en) * 2021-07-08 2023-01-13 华为技术有限公司(Cn) Data compression method and device, computing equipment and storage system
CN115801901A (en) * 2023-01-05 2023-03-14 安徽皖欣环境科技有限公司 Compression processing method for enterprise production emission data

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335384A (en) * 2014-06-30 2016-02-17 中航商用航空发动机有限责任公司 Storage method for monitoring data, reproduction method for monitoring data and devices
GB201413877D0 (en) * 2014-08-05 2014-09-17 Illumina Cambridge Ltd Methods and systems for data analysis and compression
US20170139947A1 (en) * 2015-11-16 2017-05-18 International Business Machines Corporation Columnar database compression
CN105469601A (en) * 2015-12-09 2016-04-06 浙江工业大学 A road traffic space data compression method based on LZW coding
CN105788261A (en) * 2016-04-15 2016-07-20 浙江工业大学 Road traffic space data compression method based on PCA and LZW coding
US20200091930A1 (en) * 2018-09-14 2020-03-19 Hewlett Packard Enterprise Development Lp Floating point data set compression
CN109274742A (en) * 2018-09-27 2019-01-25 北京工业大学 A kind of internet of things data acquisition and supervisor control
US20190268017A1 (en) * 2019-05-08 2019-08-29 Vinodh Gopal Self-checking compression
CN112819189A (en) * 2019-11-15 2021-05-18 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Wind power output prediction method based on historical predicted value
CN112541256A (en) * 2020-12-01 2021-03-23 中国石油大学(华东) Deep learning dimensionality reduction reconstruction-based strong heterogeneous reservoir history fitting method
CN115599757A (en) * 2021-07-08 2023-01-13 华为技术有限公司(Cn) Data compression method and device, computing equipment and storage system
CN114155048A (en) * 2021-12-29 2022-03-08 中国建设银行股份有限公司 Method and device for predicting associated business, electronic equipment and storage medium
CN114881101A (en) * 2022-03-21 2022-08-09 武汉大学 Power system typical scene associated feature selection method based on bionic search
CN115128978A (en) * 2022-06-17 2022-09-30 淮阴工学院 Internet of things environment big data detection and intelligent monitoring system
CN115801901A (en) * 2023-01-05 2023-03-14 安徽皖欣环境科技有限公司 Compression processing method for enterprise production emission data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BRIAN WOREK 等: "Enabling Approximate Storage through Lossy Media Data Compression", ACM *
TAO TIAN 等: "Perceptual Image Compression with Block-Level Just Noticeable Difference Prediction", ACM *
刘林;: "基于LZW优化算法的雷达数据压缩技术", 舰船科学技术 *
姚学忠;尚江峰;曹晶晶;盛步云;吴志宏;宋寅;: "基于相似性匹配的机泵监测模拟量数据压缩方法", 组合机床与自动化加工技术 *
张荣金;张泉灵;: "基于历史性能基准的模型预测控制性能监控", 计算机与应用化学 *
杨永军;徐江;舒逸;许帅;: "实时数据库中历史数据无损压缩算法的研究", 计算机与现代化 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185306A (en) * 2023-04-24 2023-05-30 山东爱福地生物股份有限公司 Sewage treatment system data storage method using potamogeton crispus

Also Published As

Publication number Publication date
CN115964347B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN116192971B (en) Intelligent cloud energy operation and maintenance service platform data management method
CN116681036B (en) Industrial data storage method based on digital twinning
CN109859281B (en) Compression coding method of sparse neural network
EP2455853A2 (en) Data compression method
CN115543946B (en) Financial big data optimized storage method
CN115964347B (en) Intelligent storage method for data of market supervision and monitoring center
CN117155407B (en) Intelligent mirror cabinet disinfection log data optimal storage method
CN116016606B (en) Sewage treatment operation and maintenance data efficient management system based on intelligent cloud
CN115204754B (en) Heating power supply and demand information management platform based on big data
CN115695564B (en) Efficient transmission method of Internet of things data
CN117290364B (en) Intelligent market investigation data storage method
CN115883670A (en) Medical data analysis and acquisition method and device
US20240273121A1 (en) Database data compression method and storage device
CN116318172A (en) Design simulation software data self-adaptive compression method
CN115858476A (en) Efficient storage method for user-defined form acquisition data in web development system
CN116700630A (en) Organic-inorganic compound fertilizer production data optimized storage method based on Internet of things
CN116032294A (en) Intelligent processing method for atmosphere monitoring data
CN115913247A (en) Deep lossless compression method and system for high-frequency power data
CN114221663A (en) Real-time spectrum data compression and recovery method based on character coding
CN116743182B (en) Lossless data compression method
CN116961672A (en) Lossless data compression method based on transducer encoder
CN116320501A (en) Infrared data compression method and readable storage medium
CN115567058A (en) Time sequence data lossy compression method combining prediction and coding
CN108259515A (en) A kind of lossless source compression method suitable for transmission link under Bandwidth-Constrained
WO2009088967A1 (en) Storage of stochastic information in stochastic information systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant