CN115964347A

CN115964347A - Intelligent storage method for data of market supervision monitoring center

Info

Publication number: CN115964347A
Application number: CN202310253184.1A
Authority: CN
Inventors: 刘福建; 孔建彪; 钱瑞娜; 徐子栋; 刘洋
Original assignee: Heze Product Inspection And Testing Institute
Current assignee: Heze Product Inspection And Testing Institute
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-04-14
Anticipated expiration: 2043-03-16
Also published as: CN115964347B

Abstract

The invention relates to the technical field of data compression, in particular to an intelligent storage method for data of a market supervision and monitoring center, which comprises the following steps: obtaining a characteristic coefficient according to the difference between actual data of the market monitoring center and corresponding historical data in a preset sampling time period; obtaining prediction data according to the historical data and the characteristic coefficient, and obtaining an importance index according to the actual data and the prediction data; constructing a data block, and obtaining a distribution index of the data block according to actual data in the data block and a preset characteristic value; obtaining a structural index of the data block according to a difference value between actual data in the data block; obtaining the fitting degree of the data block according to the importance index, the distribution index and the structural index; and classifying the data blocks according to the fitting degree of the data blocks, and compressing and storing the actual data in each class into a data stream by using an LZW algorithm. The invention shortens the time of data compression and improves the efficiency of data compression.

Description

Intelligent storage method for data of market supervision monitoring center

Technical Field

The invention relates to the technical field of data compression, in particular to an intelligent storage method for data of a market supervision and monitoring center.

Background

The market supervision and monitoring center monitors and analyzes market industry data, researches and predicts industry development prospects, analyzes market potential of products, monitors and statistically studies market share and landscape level, intervenes and adjusts the market in time, and avoids economic crisis.

The market supervision and monitoring center needs to perform statistical analysis on national data, so that the data volume is huge, which brings huge burden to the data storage of the detection center. Therefore, it is very important to store the data of the market monitoring center in a lossless compression manner.

In the prior art, an LZW algorithm can be adopted, and an input character string is mapped into a code with a certain length based on a dictionary, so that data compression is realized. However, because the data contains more characters, and meanwhile, different dictionaries are formed by different combinations of the same characters, the data quantity of the dictionaries is increased, the retrieval efficiency in encoding is reduced, and the efficiency in compressing and storing the data is lower.

Disclosure of Invention

In order to solve the technical problem of low efficiency when an LZW algorithm is adopted to compress and store data, the invention aims to provide an intelligent storage method for data of a market monitoring center, and the adopted technical scheme is as follows:

acquiring actual data of a market monitoring center in a preset sampling time period, and obtaining a characteristic coefficient of each actual data according to the difference between each actual data and corresponding historical data;

obtaining prediction data of each actual data according to the historical data and the characteristic coefficient corresponding to each actual data, and obtaining an importance index of each actual data according to the difference between the actual data and the corresponding prediction data;

constructing a preset number of data blocks according to all actual data in a preset sampling time period, and obtaining a distribution index of the data blocks according to each actual data in the data blocks and a preset characteristic value;

obtaining a structural index of the data block according to the distribution disorder degree of actual data in the data block; obtaining the fitting degree of the data block according to the importance index, the distribution index and the structural index of the actual data in the data block;

and classifying the data blocks according to the fitting degree of the data blocks, and compressing and storing the actual data in each class into a data stream by using an LZW algorithm.

Preferably, the method for obtaining the fitting degree of the data block specifically includes:

for any data block, calculating the average value of the importance indexes of all actual data in the data block, and taking the difference value between a preset value and the average value as an adjusting coefficient; and carrying out weighted summation on the characteristic value, the distribution index and the structure index of the data block to obtain a normalized value of a summation result, and calculating a product between an adjusting coefficient and the normalized value and a sum of the mean value to obtain the fitting degree of the data block.

Preferably, the method for acquiring the characteristic coefficient specifically includes:

for any actual data, acquiring a first set number of historical data corresponding to the actual data, recording a difference value between two adjacent historical data as a change gradient of the historical data, and acquiring a data difference between the last historical data and the first historical data corresponding to the actual data; and obtaining a characteristic coefficient of actual data according to the data difference and the change gradient of the historical data.

Preferably, the calculation formula of the characteristic coefficient is specifically:

wherein ,

characteristics representing the l-th actual dataThe coefficients of which are such that,

indicating the corresponding th actual data

The data value of the individual historical data,

a data value representing the first history data corresponding to the ith actual data,

represents the gradient of change of the nth history data,

represents the gradient of change of the (n + 1) th historical data,

representing the number of historical data corresponding to the ith actual data; exp () represents an exponential function with a natural constant e as the base, and epsilon represents a hyper-parameter.

Preferably, the method for obtaining the prediction data specifically comprises:

wherein ,

prediction data representing the ith actual data,

a characteristic coefficient representing the l-th actual data,

indicating the corresponding th actual data

The data value of the individual historical data,

a data value representing the first historical data corresponding to the ith actual data,

is shown as

The gradient of the change of the individual historical data,

the number of history data corresponding to the ith actual data is represented, and e is a natural constant.

Preferably, the method for acquiring the importance index specifically includes:

and taking the normalized value of the absolute value of the difference between the actual data and the corresponding prediction data as the importance index of the actual data.

Preferably, the obtaining of the distribution index of the data block according to each actual data in the data block and the preset characteristic value specifically includes:

and acquiring the fluctuation degree of each actual data in the data block relative to the characteristic value, and taking the fluctuation degree as the distribution index of the data block.

Preferably, the obtaining of the structural index of the data block according to the degree of distribution confusion among the actual data in the data block specifically includes:

wherein ,

indicates the structure index of the ith data block,

a data value representing the a-th actual data in the i-th data block,

a characteristic value representing the ith data block,

indicates the total amount of actual data contained in the ith data block,

representing the frequency of occurrence of the difference between the a-th actual data and the a-1 st actual data in the ith data block, and ln () representing a logarithmic function with a natural constant e as a base.

Preferably, the preset method of the characteristic value specifically includes:

and taking the average value of the mean value and the mode of all actual data in the data block as the characteristic value.

Preferably, the constructing a preset number of data blocks according to all actual data in a preset sampling time period specifically includes:

and converting all the actual data into a preset number of two-dimensional data matrixes, and forming the two-dimensional data matrixes into data blocks.

The embodiment of the invention at least has the following beneficial effects:

the method comprises the steps of obtaining a characteristic coefficient of each actual data according to the difference between the actual data of a market supervision and monitoring center and the corresponding historical data, namely performing difference analysis on the historical data of each actual data, reflecting the data change trend of the historical data by using the characteristic coefficient, obtaining predicted data of each actual data according to the historical data and the characteristic coefficient corresponding to each actual data, performing data prediction on each actual data by combining the change trend of the historical data, obtaining predicted data according with the change trend, further obtaining an importance index of each actual data according to the difference between the actual data and the corresponding predicted data, comparing the difference between the predicted data and the actual data to obtain whether the actual data accords with the change trend of the historical data or not, and reflecting the importance of the actual data by using the importance index; then, dividing the actual data into data blocks, obtaining a distribution index of the data blocks through each actual data in the data blocks and a preset characteristic value, reflecting the fluctuation distribution characteristic of the actual data in the data blocks, obtaining a structural index of the data blocks according to the distribution disorder degree of the actual data in the data blocks, and reflecting the difference disorder characteristic of the actual data in the data blocks; finally, the fitting degree of the data block is obtained according to the importance index of the actual data in the data block, the distribution index and the structure index, the fitting degree is used as a characteristic value of the data block, the data block is further classified, similar data blocks can be obtained, the data blocks with similar sequence are placed in one class, each class of data is compressed by adopting an LZW algorithm, the data volume of a dictionary can be reduced, the data compression time is shortened on the premise that the compression ratio is not lost, and the data compression efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a method for intelligently storing data in a market monitoring center according to the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined purpose, the following detailed description, the structure, the features and the effects of an intelligent data storage method for a market monitoring center according to the present invention are provided with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following describes a specific scheme of the intelligent storage method for the data of the market supervision and monitoring center in detail with reference to the accompanying drawings.

Example (b):

the specific scenes aimed by the invention are as follows: the monitoring center has huge data volume, which brings huge pressure to the data storage of the monitoring center to a certain extent, and meanwhile, the accuracy of the stored data needs to be ensured, so that the lossless compression of the data is very important. According to the invention, by analyzing the structural combination among the data, the data intervals with similar ordering are placed into one class through reordering, and each class of data is compressed by adopting an LZW algorithm, so that the time of data compression is shortened and the efficiency of data compression is improved on the premise of not losing the compression ratio.

Referring to fig. 1, a flowchart of a method for intelligently storing data of a market monitoring center according to an embodiment of the present invention is shown, where the method includes the following steps:

the method comprises the steps of firstly, acquiring actual data of a market monitoring center in a preset sampling time period, and obtaining a characteristic coefficient of each actual data according to the difference between each actual data and corresponding historical data.

Firstly, actual data of a market supervision monitoring center within a preset sampling time period needs to be acquired, and main sources of the actual data include: statistical data of a statistical bureau, relevant departments such as customs, industrial and commercial tax affairs, market monitoring data, instant research and arrangement data and the like. Integrating data of various channels, and performing preprocessing operations such as removing invalid data and interference data, wherein the data comprises characters or numerical values and the like. The characters of the obtained data are converted into 8-bit binary values through ASCII codes, and different types of data are converted into uniform binary codes.

The range of the ASCII coded data is 0-127, the corresponding binary coded data is 00000000-01111111, and the first digit value of all data in the coded data range is 0, so that the first digit value of 0 can be removed, at the moment, every 7-digit binary data forms a data value, and all the data values form a data string according to the time sequence.

Further, the data string is segmented, every 8 bits are divided into a new data value, the new data value is converted into decimal data, and finally, the number 0 of the insufficient 8 bits is used for completing the data string. By analyzing the distribution characteristics of data codes and carrying out 0 removing operation, certain calculation amount can be reduced to a certain extent.

In other embodiments, the different types of data are converted into binary codes and then converted into decimal data, and further actual data of the market monitoring center in the preset sampling time period are obtained.

In this embodiment, the preset sampling time period is set to be one week, and by collecting data of every minute of each day in one week and recording the data as actual data, the implementer can set the time according to the specific implementation scenario.

The LZW algorithm, i.e., the string table compression algorithm, achieves compression by creating a string table, i.e., a dictionary, that represents longer strings with shorter codes. LZW algorithms efficiently exploit character frequency of occurrence redundancy for compression, and dictionaries are adaptively generated, but generally do not efficiently exploit position redundancy.

The larger the amount of data is, the larger the number of character combinations included, the larger the amount of data of the dictionary becomes, and the larger the amount of data of the dictionary becomes, the longer the dictionary is generated, and the longer the search time is when encoding is performed, because the same character combinations generate different character combinations due to different position orders. Therefore, the embodiment of the invention carries out the block processing on the data, and places the approximate data blocks at the close positions to reduce the data amount in the dictionary and reduce the retrieval time so as to improve the efficiency of data compression.

On the basis, the obtained actual data are analyzed, and because the actual data of the market monitoring center are time flow data, normal data conform to a certain variation trend under normal conditions, namely the overall variation trend of the actual data is similar in a certain time period, and historical data has a larger reference basis for predicting the variation trend of future data. Therefore, compared with future data with a large change trend of the historical data, the probability that the future data belongs to important data is higher, that is, when the difference between the future data and the actual data is obtained according to the change trend of the historical data, it is shown that the actual data and the historical data have different change trends, data abnormality and the like may exist, and the importance probability is higher.

Analyzing the variation trend between each actual data and the historical data, namely obtaining the characteristic coefficient of each actual data according to the difference between each actual data and the corresponding historical data, specifically, for any actual data, obtaining a first set number of historical data corresponding to the actual data, obtaining the data difference between the last historical data and the first historical data corresponding to the actual data, and marking the difference between two adjacent historical data as the variation gradient of the historical data; and obtaining a characteristic coefficient of actual data according to the data difference and the change gradient of the historical data.

In this embodiment, the first set number is 20, and the historical data corresponding to the actual data refers to 20 data before the actual data, and the implementer may set the historical data according to a specific implementation scenario. That is, each actual data except the first actual data has corresponding history data, and when the number of history data corresponding to the actual data is less than 20, all data before the actual data is recorded as the history data of the actual data.

The calculation formula of the characteristic coefficient of the actual data is specifically as follows:

wherein ,

a characteristic coefficient representing the l-th actual data,

indicates the first actual data

The data value of the historical data, namely the last historical data corresponding to the ith actual data,

represents the gradient of change of the nth history data,

represents the gradient of the variation of the (n + 1) th history data,

the number of historical data corresponding to the ith actual data is represented, namely the first set number; exp () represents an exponential function with a natural constant e as a base, and epsilon represents a hyper-parameter, and takes a value of 1 in this embodiment in order to prevent a numerator or denominator from taking a value of 0.

For the gradient of change of the nth history data, in particular

，

A data value representing the nth history data,

a data value representing the (n-1) th history data. The change gradient reflects the data difference between two adjacent historical data, and the larger the change gradient, the larger the data change between the current historical data and the previous historical data is.

The difference between the change gradients of the two adjacent historical data is represented, and the difference of the data change degrees of the two adjacent historical data is reflected. According to

Obtaining the weight corresponding to the gradient of change of the nth history data, i.e.

The smaller the value of (a), the closer the data change degree of the two adjacent historical data is, the closer the data change trend is to the unity, and further the larger the weight corresponding to the change gradient of the nth historical data is, that is, the more the exponential function is used for carrying out negative correlation mapping on the nth historical data.

The degree of data change of the historical data is obtained from the local analysis by weighted summation of the change gradients of the historical data.

Represent

I.e. the accumulated value representing the gradient of change in the historical data, using

The data change degree of the historical data is obtained through overall analysis, the change trend condition of the historical data of the actual data can be obtained by calculating the proportion of the overall data change degree to the local data change degree, the value of the proportion is greater than or equal to 1, and the value of the characteristic coefficient can be taken as a value 0 by subtracting a constant 1. Specifically, when the change trend of the historical data is relatively uniform, the change gradients of the historical data are all equal, the difference between the change gradients of the historical data is 0, and the value of the characteristic coefficient of the obtained actual data is 0.

The characteristic coefficients reflect the variation trend of the corresponding historical data of the actual data, when the values of the characteristic coefficients are small, the difference between the variation gradients of the two adjacent historical data is small, the data variation degree between the historical data is relatively close, and the variation trend of the corresponding historical data of the actual data is relatively uniform.

And step two, obtaining the prediction data of each actual data according to the historical data and the characteristic coefficient corresponding to each actual data, and obtaining the importance index of each actual data according to the difference between the actual data and the corresponding prediction data.

The market monitoring center has the advantages that the acquisition source of the data is fixed, so that the actual data has certain stability under normal conditions, when the change trend of the historical data is uniform, the difference between the actual data is possibly small, the reference of the historical data is large, and the prediction data of each actual data can be obtained through the historical data and the change trend of the historical data.

Based on the above, the prediction data of each actual data is obtained according to each actual data, the corresponding historical data and the characteristic coefficient, and is expressed by a formula:

wherein ,

prediction data representing the ith actual data,

the characteristic coefficient representing the l-th actual data,

indicating the corresponding th actual data

The data value of the individual pieces of historical data,

is shown as

The gradient of the change of the individual historical data,

The average of the degree of data change of the historical data. When the value of the characteristic coefficient is closer to 0, the more uniform the data change degree in the historical data is, the more referential the change trend of the historical data is, and the data value of the predicted data is obtained by using the integral change quantity of the historical data. Namely use of

As the amount of change of the prediction data is dominant.

When the value of the characteristic coefficient is larger, the larger the chaos of the data change degree in the historical data is, the more the change trend of the historical data has no reference meaning, and the data value of the predicted data is obtained by using the data change of the historical data closest to the actual data, namely, by using the data change of the historical data closest to the actual data

As the amount of change in the prediction data.

Indicating the corresponding th actual data

The data value of the historical data, that is, the last historical data corresponding to the ith actual data. It should be noted that the distance between the last historical data corresponding to the actual data and the actual data is the closest, that is, the two are in an adjacent relationship, and it can also be characterized that the last historical data corresponding to the ith actual data is the (l-1) th actual data, that is, each actual data may be the historical data of one or some actual data.

The predicted data of the actual data is obtained based on the variation trend of the historical data and the degree of variation of the data, and when the difference between the data value of the actual data and the data value of the predicted data is smaller, the variation trends of the actual data and the historical data are approximately consistent. When the difference between the data value of the actual data and the data value of the predicted data is larger, it is described that the trend of the actual data is less approximate to the trend of the history data.

Based on this, the importance index of each actual data is obtained according to the difference between the actual data and the corresponding predicted data, specifically, the normalized value of the absolute value of the difference between the actual data and the corresponding predicted data is used as the importance index of the actual data, and is expressed as:

，

indicates the importance index of the ith actual data,

a data value representing the ith actual data,

predicted data representing the l-th actual data, sigmoid () is a normalization function.

The data difference condition between the ith actual data and the corresponding prediction data is reflected, when the value of the actual data is larger, the situation that the ith actual data is less suitable for the change trend of the historical data is described, the situation that data abnormity and the like possibly occur in the ith actual data is further described, and when the corresponding importance index is larger, the actual data is more important. The importance index reflects the importance of the actual data, and the larger the value of the importance index is, the more important the actual data is.

And step three, constructing a preset number of data blocks according to all actual data in a preset sampling time period, and obtaining the distribution index of the data blocks according to each actual data in the data blocks and a preset characteristic value.

First, it should be noted that the LZW algorithm sequentially transmits data to the compression module, outputs a label of a character if the character exists in the dictionary, and re-marks the character if the character does not exist, which takes a lot of time to search the dictionary during the compression process, thereby reducing the compression efficiency. The embodiment of the invention carries out block processing on actual data, places relatively close data blocks at close positions, wherein the approximation specifically comprises data proportion approximation, data distribution approximation and data structure approximation, and reduces the length of a dictionary generated by the data during compression, reduces the compression time of the data and improves the compression efficiency by limiting the compression mode of the data.

Based on this, a preset number of data blocks are constructed according to all actual data in a preset sampling time period, specifically, data values of all the actual data are converted into two-dimensional data matrixes, the two-dimensional data matrixes form the data blocks, in addition, the size of each two-dimensional data matrix is the same, an implementer needs to set the size of each two-dimensional data matrix according to a specific implementation scene, and when the size of each two-dimensional data matrix is fixed, the total number of the data blocks, namely the preset number, can be obtained according to the total number of the actual data. In the present embodiment, the amount of data that can be accommodated by the two-dimensional data matrix is set to

And each two-dimensional data matrix is free from incomplete data, a is a self-defined parameter, the value of the parameter depends on the performance of compression equipment, and the default value is 100.

Then, a characteristic value corresponding to each data block needs to be preset, and the characteristic value is used as a representative value of all actual data in the data block, and reflects the overall balance condition of all data in the data block. In the present embodiment, the feature value is set as the average of the mode and the mean of all the actual data in the data block.

Further, a distribution index of the data block is obtained according to each actual data in the data block and a preset characteristic value, specifically, a fluctuation degree of each actual data in the data block relative to the characteristic value is obtained, the fluctuation degree is used as the distribution index of the data block, and the distribution index is expressed by a formula as follows:

wherein ,

indicates the distribution index of the ith data block,

a data value representing the a-th actual data in the i-th data block,

a characteristic value representing the ith data block,

representing the total amount of actual data contained in the ith data block.

The method reflects the integral equilibrium trend of data distribution in the ith data block, and further simulates the calculation method of the standard deviation of the data in the data block

Comparing with the average value of all data in the data block, and further obtaining the fluctuation degree of each actual data in the data block relative to the characteristic value.

When the fluctuation degree of all the actual data in the data block is large, the value of the corresponding distribution index is large, and the data distribution in the data block is discrete. When the fluctuation degree of all the actual data in the data block is smaller, the smaller the value of the corresponding distribution index is, which indicates that the data distribution in the data block is more uniform.

Step four, obtaining the structural indexes of the data blocks according to the distribution disorder degree of actual data in the data blocks; and obtaining the fitting degree of the data block according to the importance index of the actual data in the data block, the distribution index and the structure index.

Firstly, it should be noted that there is a structural order between the obtained data and the data, and the higher the structural approximation degree of the data block is, when the data block is re-ordered and compressed, the data amount of the dictionary can be greatly reduced, and the compression efficiency can be improved. When a dictionary is built, character combinations in a previous data block are stored in the dictionary, if the same character combinations exist in a subsequent data block, only whether the character combinations are recorded in the dictionary needs to be considered, and whether the character combinations exist before a current character does not need to be considered, so that combinations formed by data characters in the data block can be stored in the dictionary for subsequent use if similar characters exist in other data blocks, and the compression efficiency of data can be improved on the premise of ensuring the compression rate of the data.

Then, analyzing the data chaos degree in the data block, respectively calculating the difference between two adjacent data in the data block, and further counting the number proportion of each difference, when the structure distribution between the actual data in the data block is similar, the difference between the actual data is also similar. Obtaining a structural index of the data block according to the distribution disorder degree among actual data in the data block, wherein the structural index is expressed by a formula as follows:

wherein ,

indicates the structure index of the ith data block,

a data value representing the a-th actual data in the i-th data block,

a characteristic value representing the ith data block,

indicates the total amount of actual data contained in the ith data block,

denotes the frequency of occurrence of the difference between the a-th actual data and the a-1 st actual data in the i-th data block, and ln () denotes a logarithmic function with a natural constant e as a base.

By calculating the distribution entropy of the difference between two adjacent actual data in the data block

The method reflects the chaotic condition of difference distribution between two adjacent actual data in the data block, and further reflects the structural relation of the actual data in the data block. As the distribution of differences between actual data in a data block becomes more similar,

the smaller the value of (A), the more uniform the actual data distribution in the data block. When the distribution of differences between actual data in a data block is more dissimilar,

the larger the value of (A), the more disordered the actual data distribution in the data block is.

Meanwhile, in order to avoid different data blocks from having the same data distribution situation, the method adopts

And correcting the distribution entropy. For two data blocks, the more similar the composition between the data is, the more similar the distribution of the internal data structure is, the closer the structure indexes of the two data blocks are. The structural index reflects the distribution of data in the data block.

And finally, representing the characteristic information of the actual data in the data block by combining the fluctuation degree and the structure distribution condition of the actual data in the data block and the importance distribution of the actual data. Namely, the fitting degree of the data block is obtained according to the importance index, the distribution index and the structure index of the actual data in the data block.

Specifically, for any data block, the average value of the importance indexes of all actual data in the data block is calculated, and the difference value between a preset value and the average value is used as an adjusting coefficient. And carrying out weighted summation on the characteristic value, the distribution index and the structure index of the data block to obtain a normalized value of a summation result, and calculating a product between an adjusting coefficient and the normalized value and a sum of the average value to obtain the fitting degree of the data block.

The calculation formula of the fitting degree is specifically as follows:

wherein ,

indicating the degree of fit of the ith data block,

represents the average of the importance indicators of all the actual data in the ith data block,

a characteristic value representing the ith data block,

indicates the distribution index of the ith data block,

and sigmoid () is a normalization function, which represents a structural index of the ith data block.

、

And

for the weight, the values in this embodiment are 0.3, and 0.4, respectively, and the implementer can set the values according to the specific implementation scenario.

Which represents the adjustment coefficient, the value of the preset value is 1 in this embodiment.

Reflecting the size of the importance of the data within the data block,

the larger the value of (a) is, the greater the importance of the actual data in the data block is, the less the reordering operation of the data block is required to be performed on the data, that is, the less the operation on the more important data is required to be performed, so as to avoid the loss of the important data, and further, the less the confusion degree and the structural distribution condition of the actual data in the data block are required to be concerned.

The smaller the value of (A), the smaller the importance of the actual data in the data block, and the adjustment coefficient

The larger the value of (b) is, the more attention needs to be paid to the fluctuation degree of actual data inside the data block and the disorder of the difference.

And taking the fitting degree as a characteristic value of the data block, wherein the characteristic value of the data block is mainly an importance index of actual data inside the data block when the data inside the data block is more important, and the characteristic value of the data block is mainly the fluctuation degree and the confusion condition of difference of the actual data inside the data block when the data inside the data block is less important.

And step five, classifying the data blocks according to the fitting degree of the data blocks, and compressing and storing the actual data in each class into a data stream according to an LZW algorithm.

Specifically, the closer the fitting degree of the data blocks is taken, the greater the probability that there is a duplicate dictionary between the data blocks. That is, when the representation values of the data blocks are relatively close, it is indicated that the importance inside the data blocks is relatively close, or the degree of confusion and the structural distribution of the actual data inside the data blocks are relatively similar. Based on this, the data blocks are arranged according to the value size of the fitting degree according to a set order, in this embodiment, the set order is from large to small, and an implementer can set the data blocks according to a specific implementation scenario.

Furthermore, the arranged data blocks are classified according to the fitting degree to obtain a plurality of categories, each category comprises at least two data blocks, an implementer can select a proper clustering algorithm to process according to a specific implementation scene, and the importance of the data blocks in each category is relatively close or the data distribution is relatively similar. The actual data of the individual data blocks in each category can then be combined into a data stream.

In this embodiment, the two-dimensional data matrix corresponding to the data block in each category is converted into a one-dimensional data sequence, and the one-dimensional data sequence corresponding to each data block is arranged according to a set sequence to form a data stream corresponding to the category, where the set sequence is a sequence of fitting degrees of the data blocks from large to small. Further, the LZW algorithm is applied to the data stream corresponding to each category to compress the data stream.

It should be noted that each category corresponds to one data stream and one dictionary. The LZW algorithm is a well-known technique, and is only briefly introduced here, and for any data stream, the first data P and the second data C in the data stream are obtained, and the specific compression rule is as follows:

(1) If the data P + data C are in the dictionary, P = P + C, no output is carried out, and the step is repeated;

(2) If the data P + data C is not in the dictionary, P = C, that is, the dictionary P + C outputs a corresponding label of the current data P in the dictionary.

And finally, after the compressed data corresponding to each category is obtained, storing the data according to the requirements of the market supervision and monitoring center. When data needs to be extracted and decompressed, an instruction for extracting the data is sent to the storage device, and the storage unit extracts the compressed data and the dictionary corresponding to the data after receiving the instruction.

In this embodiment, the decompression method specifically includes: the dictionary is used to decompress the compressed data, which is decompression by LZW algorithm, which is a well-known technique and will not be described in detail herein. And restoring and sequencing the data obtained by decompression according to the arrangement sequence of the data before compression, and converting the data from decimal system to binary system. And then the data is segmented by 7 bits, and 0 is supplemented to the head bit of each data.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; the modifications or substitutions do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application, and are included in the protection scope of the present application.

Claims

1. An intelligent storage method for market supervision and monitoring center data is characterized by comprising the following steps:

2. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for obtaining the fitting degree of the data block specifically comprises the following steps:

3. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for acquiring the characteristic coefficient specifically comprises the following steps:

for any actual data, acquiring a first set number of historical data corresponding to the actual data, recording a difference value between two adjacent historical data as a change gradient of the historical data, and acquiring a data difference between the last historical data and the first historical data corresponding to the actual data; and obtaining the characteristic coefficient of the actual data according to the data difference and the change gradient of the historical data.

4. The intelligent storage method for the data of the market supervision and monitoring center according to claim 3, wherein the calculation formula of the characteristic coefficient is specifically as follows:

wherein ,

characteristic coefficient representing the actual data of the i th |, is determined>

Indicates the th or th corresponding actual data>

A data value of the historical data->

Representing the first history corresponding to the ith actual dataThe data value of the data is greater or less>

A change gradient representing an nth history data>

Represents the gradient of change of the (n + 1) th history data>

Representing the quantity of historical data corresponding to the ith actual data; exp () represents an exponential function with a natural constant e as the base, and epsilon represents a hyper-parameter.

5. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for obtaining the forecast data specifically comprises:

wherein ,

prediction data representing the ith actual data, based on the predicted data, and based on the predicted data>

Represents the th ^ corresponding actual data>

A data value of historical data +>

A data value representing a first historical data corresponding to the/th actual data->

Indicates the fifth->

A gradient of change in individual historical data->

6. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the method for obtaining the importance index specifically comprises the following steps:

7. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the obtaining of the distribution index of the data block according to each actual data and the preset characteristic value in the data block specifically comprises:

8. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the obtaining of the structural index of the data block according to the degree of distribution confusion among the actual data in the data block is specifically as follows:

wherein ,

indicates the structure index of the ith data block, device for combining or screening>

A data value representing the a-th actual data in the i-th data block>

Represents the characteristic value of the i-th data block, is greater than>

Representing the total amount of actual data contained in the i-th data block, based on the data block size and the number of data blocks in the block size>

9. The intelligent storage method for the data of the market supervision and monitoring center according to claim 1, wherein the preset method of the characteristic value is specifically as follows:

10. The method for intelligently storing the data of the market supervision and monitoring center according to claim 1, wherein the constructing of the data blocks of the preset number according to all the actual data in the preset sampling time period specifically comprises: