CN114969060B - Industrial equipment time sequence data compression storage method and device - Google Patents

Industrial equipment time sequence data compression storage method and device Download PDF

Info

Publication number
CN114969060B
CN114969060B CN202210913155.9A CN202210913155A CN114969060B CN 114969060 B CN114969060 B CN 114969060B CN 202210913155 A CN202210913155 A CN 202210913155A CN 114969060 B CN114969060 B CN 114969060B
Authority
CN
China
Prior art keywords
data
time sequence
sequence data
time
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210913155.9A
Other languages
Chinese (zh)
Other versions
CN114969060A (en
Inventor
吴伟
刘润新
蔡正心
董守镏
叶佩炜
唐雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Mulian Internet Of Things Technology Co ltd
Original Assignee
Zhejiang Mulian Internet Of Things Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Mulian Internet Of Things Technology Co ltd filed Critical Zhejiang Mulian Internet Of Things Technology Co ltd
Priority to CN202210913155.9A priority Critical patent/CN114969060B/en
Publication of CN114969060A publication Critical patent/CN114969060A/en
Application granted granted Critical
Publication of CN114969060B publication Critical patent/CN114969060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Manufacturing & Machinery (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to a method and a device for compressing and storing time sequence data of industrial equipment. The method comprises the following steps: acquiring time sequence data acquired at the current time; calculating the deviation of the time sequence data and the time sequence data recorded last time; when the deviation exceeds a dead zone range, recording the time sequence data; acquiring time sequence data recorded for multiple times, and counting the occurrence frequency of each characteristic value in the time sequence data recorded for multiple times; calculating the information entropy of each characteristic value according to the statistical frequency; obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values; and (3) coding the time series data recorded for multiple times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result. By adopting the method, the data compression efficiency can be improved, and the cost can be reduced.

Description

Industrial equipment time sequence data compression storage method and device
Technical Field
The present application relates to the field of data compression technologies, and in particular, to a method and an apparatus for compressing and storing time series data of an industrial device.
Background
In the field of industrial production, industrial equipment stably runs for a long time, repeatedly works according to set control logic repeatedly, needs to acquire running data of the industrial equipment in real time when the industrial equipment is controlled, and has high data acquisition frequency, and occupies a very large storage space along with the accumulation of time, so that the storage cost is increased. With the continuous development of the information industry, the information amount in the data acquisition system of the industrial equipment is also increased explosively. Moreover, the number of data acquisition points integrated in the whole process industry is usually thousands to hundreds of thousands, and the data acquisition interval reaches the second level, so that the industrial equipment data acquisition system has the characteristics of large data volume and low value density. The characteristics make data analysis complicated and complicated, and useful information is difficult to extract. In order to reduce the hardware cost required by data storage, reduce redundant data and improve the data value, the data of the industrial equipment needs to be compressed and stored.
At present, manufacturers and research institutions at home and abroad propose a plurality of data compression methods, wherein the lossy compression algorithm comprises a revolving door compression algorithm, a backward slope compression algorithm, a dead zone limit compression algorithm and the like, and the lossless compression algorithm comprises shannon coding, huffman coding, arithmetic coding and the like.
However, in a real industrial plant data acquisition system, it is difficult to achieve optimal compression efficiency with a single lossy compression or lossless compression scheme. Meanwhile, the lossy compression algorithm requires a large amount of tests and parameter adjustment to achieve the balance between the compression rate and the average decompression error in use, which requires experienced engineers to spend a large amount of time in the actual use process, further increasing the labor cost of enterprises. In summary, existing compression methods are inefficient and costly to compress.
Disclosure of Invention
In view of the above, it is necessary to provide a method and an apparatus for compressing and storing time series data of industrial equipment, which can improve compression efficiency and reduce compression cost.
An industrial equipment time sequence data compression storage method comprises the following steps:
acquiring time sequence data acquired at the current time;
calculating the deviation of the time sequence data and the time sequence data recorded last time;
when the deviation exceeds a dead zone range, recording the time sequence data;
acquiring time sequence data recorded for multiple times, and counting the occurrence frequency of each characteristic value in the time sequence data recorded for multiple times;
calculating the information entropy of each characteristic value according to the statistical frequency;
obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values;
and coding the time sequence data recorded for multiple times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result.
In one embodiment, before recording the time series data when the deviation exceeds the dead zone range, the method includes: according to the compressed data stored last time, acquiring original data before compression and decompressed data after decompression, and calculating a decompression average error; calculating the difference value between the decompression average error and the standard decompression average error, and judging whether the difference value is a positive number; if the difference is a positive number, reducing the dead zone limit value; if the difference is negative, increasing the dead zone limit value; if the difference is zero, the dead zone limit value is unchanged; wherein the dead band range is determined from the dead band limit.
In one embodiment, the reducing the dead band limit if the difference is a positive number includes: if the difference is a positive number, calculating a dead zone limit adjustment value according to the difference and the adjustment proportion, and reducing the dead zone limit according to the adjustment value; if the difference is negative, increasing the dead zone limit value, including: and if the difference is a negative number, calculating a dead zone limit adjusting value according to the difference and the adjusting proportion, and increasing the dead zone limit according to the dead zone limit adjusting value.
In one embodiment, after calculating a difference between the decompressed average error and the standard decompressed average error and determining whether the difference is a positive number, the method includes: if the difference value between the decompression average error calculated for a plurality of times continuously and the standard decompression average error is positive, increasing the adjustment proportion; and if the difference value of the decompression average error calculated for a plurality of times in succession and the standard decompression average error is negative, reducing the adjustment proportion.
In one embodiment, the obtaining original data before compression and decompressed data after decompression according to the compressed data stored last time, and calculating a decompressed average error includes: acquiring original data before compression according to the compressed data stored last time, wherein the original data comprises all time sequence data acquired by the last compression; according to the compressed data stored last time, performing inverse coding of interval coding on the compressed data, and decompressing by a decompression algorithm to obtain decompressed data; and calculating the decompression average error according to the original data and the decompression data. The compressed data stored last time is the compressed data of the previous time of the current compression, and the original data is the data before the dead zone limit value compression in the previous compression process.
In one embodiment, the time sequence data recorded for multiple times is time sequence data recorded for a preset number of times; or, the time sequence data recorded for multiple times is the time sequence data recorded in the preset time period.
In one embodiment, the deadband range is determined based on the last recorded timing data and a deadband limit.
An industrial equipment time series data compression storage device, the device comprising:
the data acquisition module is used for acquiring time sequence data acquired at the current time;
the deviation calculation module is used for calculating the deviation of the time sequence data and the time sequence data recorded last time;
the data recording module is used for recording the time sequence data when the deviation exceeds the dead zone range;
the frequency counting module is used for acquiring the time sequence data recorded for multiple times and counting the frequency of each characteristic value in the time sequence data recorded for multiple times;
the information entropy calculation module is used for calculating the information entropy of each characteristic value according to the statistical frequency;
the weight calculation module is used for obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values;
and the compression storage module is used for coding the time sequence data recorded for many times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring time sequence data acquired at the current time;
calculating the deviation of the time sequence data and the time sequence data recorded last time;
when the deviation exceeds a dead zone range, recording the time sequence data;
acquiring time sequence data recorded for multiple times, and counting the occurrence frequency of each characteristic value in the time sequence data recorded for multiple times;
calculating the information entropy of each characteristic value according to the statistical frequency;
obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values;
and coding the time sequence data recorded for multiple times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring time sequence data acquired at the current time;
calculating the deviation of the time sequence data and the time sequence data recorded last time;
when the deviation exceeds a dead zone range, recording the time sequence data;
acquiring time sequence data recorded for multiple times, and counting the occurrence frequency of each characteristic value in the time sequence data recorded for multiple times;
calculating the information entropy of each characteristic value according to the statistical frequency;
obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values;
and coding the time sequence data recorded for multiple times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result.
According to the method, the device, the computer equipment and the storage medium for compressing and storing the time sequence data of the industrial equipment, a lossy compression mode and a lossless compression mode are adopted, on one hand, a large amount of low-value data are eliminated, the compression efficiency is improved, on the other hand, the probability of interval coding is calculated through the information entropy, the probability of appearance of characteristic values with high information value is increased, the characteristic values with high information value are reserved to the greatest extent by adopting the interval coding mode, and the compression accuracy is improved.
Drawings
FIG. 1 is a diagram of an embodiment of an application environment of a method for compressing and storing time series data of an industrial device;
FIG. 2 is a schematic flow chart illustrating a method for compressing and storing time series data of an industrial device according to an embodiment;
FIG. 3 is a flow chart illustrating the dead band limit adjustment step in one embodiment;
FIG. 4 is a schematic diagram of the coordinates of the dead band limit compression method in one embodiment;
FIG. 5 is a flowchart illustrating a section encoding step according to an embodiment
FIG. 6 is a block diagram of an embodiment of an industrial plant time series data compression storage apparatus;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
The method for compressing and storing the time series data of the industrial equipment can be applied to the application environment shown in fig. 1. Wherein, the data collected by the industrial device 102 is sent to the server 104 for storage through the network. The server 104 acquires time sequence data acquired at the current time; calculating the deviation of the time sequence data and the time sequence data recorded last time; when the deviation exceeds a dead zone range, recording the time sequence data; acquiring time sequence data recorded for multiple times, and counting the occurrence frequency of each characteristic value in the time sequence data recorded for multiple times; calculating the information entropy of each characteristic value according to the statistical frequency; obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values; and coding the time sequence data recorded for multiple times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result. The industrial equipment 102 may be various production equipment, processing equipment and packaging equipment, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
It should be noted that, in the present application, the information entropy is a quantification of information, and refers to an expectation of information amount brought by all possible events; lossy data compression, which refers to a compression method in which compressed and decompressed data is different from but very close to original data; lossless data compression refers to reconstruction (or called restoration or decompression) using compressed data, and the reconstructed data is completely the same as the original data, but the compression ratio is usually smaller than that of lossy data compression.
In one embodiment, the method for compressing and storing the time series data of the industrial equipment is provided, and is used for solving the problems of large operation data volume, repeated occurrence of data, low data value density, complex data analysis and visualization and the like of an industrial equipment data acquisition system. Because, for the same batch of production/processing material, the physical properties are relatively close, resulting in the raw data of the industrial control equipment having short-term fluctuations, but substantially stable properties over a long period of time. The method comprises the following steps:
and S110, acquiring time sequence data acquired at the current time.
The industrial equipment continuously acquires operation data through the sensor in the operation process, and the operation data acquired by the sensor is sent to the memory for storage. The time sequence data is data acquired at one time.
And S120, calculating the deviation of the time sequence data and the time sequence data recorded last time.
Where the time series data here is not the first compressed data point. And when the time series data is the first compressed data point, directly recording the time series data. Each recorded time series of data is referred to as a compressed data point.
And S130, recording the time sequence data when the deviation exceeds the dead zone range.
Wherein, the deviation is equal to the absolute value of the difference between the time sequence data collected at the current time and the time sequence data recorded last time. And recording the time sequence data, namely storing the time sequence data. As shown in fig. 4, the dead zone range is a numerical range determined by upper and lower broken lines with the broken line of a being the center axis of symmetry, the circle in the figure represents time series data, as shown in the figure, the deviation of five data after the circle a and the time series data a is in the dead zone range, and then the data do not need to be recorded, and the deviation of the time series data B and the time series data a exceeds the dead zone range, then the time series data B is recorded; by analogy, if the deviation of the five data after the circle B and the time sequence data B is in the dead zone range, the time sequence data C is not recorded until the time sequence data C appears, and if the deviation of the time sequence data C and the time sequence data B exceeds the dead zone range, the time sequence data C is recorded.
And S140, acquiring the time sequence data recorded for multiple times, and counting the frequency of each characteristic value in the time sequence data recorded for multiple times.
The time sequence data recorded for many times is acquired to be compressed and stored for one time, and the compression efficiency can be improved. The characteristic value is a numerical value of occurrence of time sequence data recorded for multiple times, for example, if the time sequence data is 901321, the characteristic values are 9, 0, 1, 3 and 2; here, 9 is a first characteristic value, and the frequency of occurrence is 1;0 is a second characteristic value, and the occurrence frequency is 1;1 is a third characteristic value, and the occurrence frequency is 2;3 is a fourth occurrence eigenvalue, the frequency of occurrence being 1;2 is the fifth eigenvalue and the frequency of occurrence is 1.
And S150, calculating the information entropy of each characteristic value according to the statistical frequency.
The calculation formula of the information entropy of each characteristic value is as follows:
Figure 815914DEST_PATH_IMAGE001
wherein H (i) For the i-th eigenvalue entropy, p (i) For the probability of the occurrence of the ith characteristic value, the probability of the occurrence of the characteristic value is equal to the ratio of the frequency of the occurrence of the characteristic value to the sum of the frequencies of the occurrence of all the characteristic values, for example, the time sequence data is 901321, and the probability of the occurrence of the characteristic value 1 = 2/(1 +2+ 1) =1/3.
S160, obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values.
The calculation formula of the weight of each characteristic value is as follows:
Figure 405159DEST_PATH_IMAGE002
wherein, W (i) As the weight of the ith characteristic value,H (i) Entropy of information as the ith characteristic value, H General assembly Is the total information entropy, which is equal to the sum of the information entropies of all the feature values, wherein,
Figure 903005DEST_PATH_IMAGE003
s170, coding the time series data recorded for multiple times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result.
In the method for compressing and storing the time series data of the industrial equipment, a lossy compression mode and a lossless compression mode are adopted, on one hand, a large amount of low-value data are removed, the compression efficiency is improved, on the other hand, the probability of interval coding is calculated through the information entropy, the probability of appearance of characteristic values with high information value is increased, the characteristic values with high information value are reserved to the greatest extent by adopting the interval coding mode, and the compression accuracy is improved.
In one embodiment, before recording the time series data when the deviation exceeds the dead zone range, the method includes: according to the compressed data stored last time, acquiring original data before compression and decompressed data after decompression, and calculating a decompression average error; calculating the difference value between the decompression average error and the standard decompression average error, and judging whether the difference value is a positive number; if the difference is a positive number, reducing the dead zone limit value; if the difference is negative, increasing the dead zone limit value; if the difference is zero, the dead zone limit value is unchanged; wherein the dead band range is determined from the dead band limit.
The calculation formula of the decompression average error is as follows:
Figure 970318DEST_PATH_IMAGE004
where δ is the decompressed average error, y 0 ,y 1 ,……y n For the number n of the original data pieces,
Figure 234291DEST_PATH_IMAGE005
and decompressing data after decompression corresponding to the n original data. Decompressing the mean error according to a set criterion
Figure 994436DEST_PATH_IMAGE006
Calculating the difference of the average error of decompression
Figure 713999DEST_PATH_IMAGE007
In one embodiment, the reducing the dead band limit if the difference is a positive number includes: if the difference is a positive number, calculating a dead zone limit adjustment value according to the difference and the adjustment proportion, and reducing the dead zone limit according to the adjustment value; if the difference is negative, increasing the dead zone limit value, including: and if the difference is a negative number, calculating a dead zone limit adjusting value according to the difference and the adjusting proportion, and increasing the dead zone limit according to the dead zone limit adjusting value.
Wherein, the dead zone limit adjustment value: y =Δδ × k, where Δ δ is the difference between the decompressed average error and the standard decompressed average error, and k is the adjustment ratio.
Specifically, as shown in fig. 3, the dead band limit is adjusted as follows:
s121, according to the compressed data stored last time, acquiring original data before compression and decompressed data after decompression, and calculating a decompression average error;
s122, calculating the difference value between the decompression average error and the standard decompression average error;
s123, judging whether the difference value is a positive number or not;
s124, if the difference is a positive number, reducing the dead zone limit value in proportion;
s125, judging whether the difference value is a negative number;
and S126, if the difference is negative, increasing the dead zone limit value.
In one embodiment, after calculating a difference between the decompressed average error and the standard decompressed average error and determining whether the difference is a positive number, the method includes: if the difference value between the decompression average error calculated for a plurality of times continuously and the standard decompression average error is positive, increasing the adjustment proportion; and if the difference value of the decompression average error calculated for a plurality of times in succession and the standard decompression average error is negative, reducing the adjustment proportion.
Wherein, the increasing or decreasing amplitude of the adjusting proportion is set according to the requirement.
In one embodiment, the obtaining original data before compression and decompressed data after decompression according to the last stored compressed data, and calculating a decompressed average error includes: acquiring original data before compression according to the compressed data stored last time, wherein the original data comprises all time sequence data acquired by the last compression; according to the compressed data stored last time, performing inverse coding of interval coding on the compressed data, and decompressing by a decompression algorithm to obtain decompressed data; and calculating the decompression average error according to the original data and the decompression data. The compressed data stored last time is the compressed data of the previous time of the current compression, and the original data is the data before the dead zone limit value compression in the previous compression process.
The decompression algorithm adopts curve fitting algorithms such as a linear least square method, a polynomial fitting method, a least square optimization method and the like, curve fitting is carried out on data obtained after reverse coding of interval coding is carried out on stored compressed data, then a function value is selected from the fitted curve according to the data acquisition time, and the selected function value is used as decompressed data. For example, as shown in fig. 4, the data obtained by inverse coding of the stored compressed data by the interval coding is A, B, C, because the interval coding is lossless compression, the data A, B, C is still obtained after the inverse coding, the data A, B, C is subjected to curve fitting to obtain a function related to the data A, B, C, because four data exist before dead zone limit compression between a and B, B and C, the corresponding function value is selected from the function of the data A, B, C according to the collection time of the discarded data, the function value is used as the value of the dead zone limit compression reduction, the data A, B, C and the selected function value are used as decompressed data, and the data A, B, C and the selected function value are sorted in the decompressed data according to the sequence of the collection time.
In one embodiment, the time sequence data recorded for multiple times is time sequence data recorded for a preset number of times; or the time sequence data recorded for multiple times is the time sequence data recorded in the preset time period.
In one embodiment, the deadband range is determined based on the last recorded timing data and a deadband limit.
In one embodiment, the interval encoding step is as follows:
s171, selecting a section of larger digital interval;
s172, dividing the digital interval into a plurality of sub-intervals according to the weight of the characteristic value;
s173, judging whether all the time sequence data can be matched with a subinterval at the lowest layer;
and S174, if so, compressing the time series data recorded for a plurality of times according to the form of subinterval coding.
It should be understood that although the steps in the flowcharts of fig. 2, 3-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 3-5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternatively with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 6, there is provided an industrial equipment time series data compression storage apparatus, including:
and the data acquisition module 210 is configured to acquire time series data acquired at the current time.
And a deviation calculating module 220, configured to calculate a deviation between the time series data and the last recorded time series data.
And a data recording module 230, configured to record the time series data when the deviation exceeds the dead zone range.
And the frequency counting module 240 is configured to obtain the time series data of multiple records, and count the frequency of occurrence of each feature value in the time series data of the multiple records.
And an information entropy calculation module 250, configured to calculate an information entropy of each feature value according to the statistical frequency.
The weight calculation module 260 is configured to obtain a weight of each feature value according to the weight of the information entropy of each feature value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values.
And a compression storage module 270, configured to use each eigenvalue as a symbol, use the weight of the eigenvalue as the probability of interval coding, use the sorting order of the eigenvalue as the order of the symbols, and encode the time series data recorded multiple times in an interval coding manner, so as to obtain and store an encoding result.
In one embodiment, the apparatus for compressing and storing time series data of industrial equipment further comprises:
the decompression average error calculation module is used for acquiring original data before compression and decompressed data after decompression according to the compressed data stored last time and calculating a decompression average error;
the difference sign judgment module is used for calculating the difference between the decompression average error and the standard decompression average error and judging whether the difference is a positive number;
the dead zone limit value reducing module is used for reducing the dead zone limit value if the difference value is a positive number;
the dead zone limit value increasing module is used for increasing the dead zone limit value if the difference value is a negative number; if the difference is zero, the dead zone limit value is unchanged; wherein the dead band range is determined from the dead band limit.
In one embodiment, the dead zone limit reducing module is further configured to calculate a dead zone limit adjusting value according to the difference and an adjusting ratio if the difference is a positive number, and reduce the dead zone limit according to the adjusting value; the dead zone limit value increasing module is also used for calculating a dead zone limit value adjusting value according to the difference value and the adjusting proportion and increasing the dead zone limit value according to the dead zone limit value adjusting value if the difference value is a negative number.
In one embodiment, the industrial equipment time series data compression storage device further includes: the adjustment proportion increasing module is used for increasing the adjustment proportion if the difference value between the decompression average error continuously calculated for multiple times and the standard decompression average error is positive; and the adjustment proportion reducing module is used for reducing the adjustment proportion if the difference value between the decompression average error calculated for a plurality of times continuously and the standard decompression average error is negative.
In one embodiment, the decompressed average error calculation module includes: the original data acquisition unit is used for acquiring original data before compression according to the compressed data stored last time, wherein the original data comprises all time sequence data acquired by the last compression; the decompressed data computing unit is used for performing inverse coding of interval coding on the compressed data according to the compressed data stored last time, and decompressing a decompressed algorithm to obtain decompressed data; and the decompression average error calculation unit is used for calculating the decompression average error according to the original data and the decompression data.
In one embodiment, the time sequence data recorded for multiple times is time sequence data recorded for a preset number of times; or the time sequence data recorded for multiple times is the time sequence data recorded in the preset time period.
In one embodiment, the deadband range is determined based on the last recorded timing data and a deadband limit.
For specific limitations of the compressed storage apparatus for the time series data of the industrial equipment, reference may be made to the above limitations on the compressed storage method for the time series data of the industrial equipment, and details are not described herein again. All or part of each module in the industrial equipment time sequence data compression storage device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the encoding result data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize the compression and storage method of the time sequence data of the industrial equipment.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for compressing and storing time series data of industrial equipment is characterized by comprising the following steps:
acquiring time sequence data acquired at the current time;
calculating the deviation of the time sequence data and the time sequence data recorded last time;
when the deviation exceeds a dead zone range, recording the time sequence data;
acquiring time sequence data recorded for multiple times, and counting the occurrence frequency of each characteristic value in the time sequence data recorded for multiple times;
calculating the information entropy of each characteristic value according to the statistical frequency;
obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values;
and coding the time sequence data recorded for multiple times by adopting a section coding mode by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of section coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result.
2. The method of claim 1, wherein prior to recording the time series data when the deviation exceeds a dead band range, comprising:
according to the compressed data stored last time, acquiring original data before compression and decompressed data after decompression, and calculating a decompression average error;
calculating the difference value between the decompression average error and the standard decompression average error, and judging whether the difference value is a positive number;
if the difference is a positive number, reducing the dead zone limit value;
if the difference is negative, increasing the dead zone limit value;
if the difference is zero, the dead zone limit value is unchanged;
wherein the dead band range is determined from the dead band limit.
3. The method of claim 2, wherein reducing the dead band limit if the difference is a positive number comprises: if the difference is a positive number, calculating a dead zone limit adjustment value according to the difference and the adjustment proportion, and reducing the dead zone limit according to the adjustment value;
if the difference is negative, increasing the dead zone limit value, including: and if the difference is a negative number, calculating a dead zone limit adjusting value according to the difference and the adjusting proportion, and increasing the dead zone limit according to the dead zone limit adjusting value.
4. The method of claim 3, after calculating a difference between the decompressed average error and the standard decompressed average error and determining whether the difference is positive, comprising:
if the difference value between the decompression average error calculated for a plurality of times continuously and the standard decompression average error is positive, increasing the adjustment proportion;
and if the difference value of the decompression average error calculated for a plurality of times in succession and the standard decompression average error is negative, reducing the adjustment proportion.
5. The method of claim 2, wherein the obtaining of original data before compression and decompressed data after decompression according to the compressed data stored last time, and calculating a decompression average error comprises:
acquiring original data before compression according to the last stored compressed data, wherein the original data comprises all time sequence data acquired by the last compression;
according to the compressed data stored last time, performing inverse coding of interval coding on the compressed data, and decompressing by a decompression algorithm to obtain decompressed data;
and calculating the decompression average error according to the original data and the decompression data.
6. The method according to claim 1, wherein the time series data recorded a plurality of times is time series data recorded a preset number of times in succession; or the time sequence data recorded for multiple times is the time sequence data recorded in the preset time period.
7. The method of claim 2, wherein the deadband range is determined from last recorded timing data and a deadband limit.
8. An industrial equipment time sequence data compression and storage device is characterized by comprising:
the data acquisition module is used for acquiring time sequence data acquired at the current time;
the deviation calculation module is used for calculating the deviation of the time sequence data and the time sequence data recorded last time;
the data recording module is used for recording the time sequence data when the deviation exceeds a dead zone range;
the frequency counting module is used for acquiring the time sequence data recorded for multiple times and counting the frequency of each characteristic value in the time sequence data recorded for multiple times;
the information entropy calculation module is used for calculating the information entropy of each characteristic value according to the statistical frequency;
the weight calculation module is used for obtaining the weight of each characteristic value according to the weight of the information entropy of each characteristic value in the total information entropy; wherein, the total information entropy is the sum of the information entropies of all the characteristic values;
and the compression storage module is used for coding the time sequence data recorded for multiple times in a mode of interval coding by taking each characteristic value as a symbol, taking the weight of the characteristic value as the probability of interval coding and taking the sequencing sequence of the characteristic values as the sequence of the symbol, and obtaining and storing a coding result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210913155.9A 2022-08-01 2022-08-01 Industrial equipment time sequence data compression storage method and device Active CN114969060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210913155.9A CN114969060B (en) 2022-08-01 2022-08-01 Industrial equipment time sequence data compression storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210913155.9A CN114969060B (en) 2022-08-01 2022-08-01 Industrial equipment time sequence data compression storage method and device

Publications (2)

Publication Number Publication Date
CN114969060A CN114969060A (en) 2022-08-30
CN114969060B true CN114969060B (en) 2022-11-04

Family

ID=82969668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210913155.9A Active CN114969060B (en) 2022-08-01 2022-08-01 Industrial equipment time sequence data compression storage method and device

Country Status (1)

Country Link
CN (1) CN114969060B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116913057B (en) * 2023-09-12 2023-12-01 西安中创博远网络科技有限公司 Livestock-raising abnormal early warning system based on thing networking
CN116962299B (en) * 2023-09-21 2024-01-19 广东云下汇金科技有限公司 Data center calculation force scheduling method, system, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103888144A (en) * 2014-03-04 2014-06-25 上海挚连科技有限公司 Self-adaptation data prediction coding algorithm based on information entropy optimization
CN104682962A (en) * 2015-02-09 2015-06-03 南京邦耀科技发展有限公司 Compression method for massive fuel gas data
CN112965976A (en) * 2021-02-26 2021-06-15 中国人民解放军海军工程大学 Electromagnetic energy system service time sequence data compression method, non-transient readable recording medium and data processing device
CN113922947A (en) * 2021-09-18 2022-01-11 湖南遥昇通信技术有限公司 Adaptive symmetric coding method and system based on weighted probability model
CN114640355A (en) * 2022-03-30 2022-06-17 北京诺司时空科技有限公司 Lossy compression and decompression method, system, storage medium and equipment of time sequence database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140132429A1 (en) * 2012-11-10 2014-05-15 John Conant Scoville Method for data compression and inference

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103888144A (en) * 2014-03-04 2014-06-25 上海挚连科技有限公司 Self-adaptation data prediction coding algorithm based on information entropy optimization
CN104682962A (en) * 2015-02-09 2015-06-03 南京邦耀科技发展有限公司 Compression method for massive fuel gas data
CN112965976A (en) * 2021-02-26 2021-06-15 中国人民解放军海军工程大学 Electromagnetic energy system service time sequence data compression method, non-transient readable recording medium and data processing device
CN113922947A (en) * 2021-09-18 2022-01-11 湖南遥昇通信技术有限公司 Adaptive symmetric coding method and system based on weighted probability model
CN114640355A (en) * 2022-03-30 2022-06-17 北京诺司时空科技有限公司 Lossy compression and decompression method, system, storage medium and equipment of time sequence database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于工业实时数据的压缩算法研究;陈骞;《科协论坛(下半月)》;20090925(第09期);全文 *
时序数据并行压缩速率改进技术研究;骆金维等;《电子设计工程》;20181020;全文 *

Also Published As

Publication number Publication date
CN114969060A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN114969060B (en) Industrial equipment time sequence data compression storage method and device
Zhao et al. Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation
Hübbe et al. Evaluating lossy compression on climate data
CN110504974B (en) D-PMU measurement data segmented slice hybrid compression storage method and device
JP5176175B2 (en) System, method and program for predicting file size of image converted by changing and scaling quality control parameters
CN112465140A (en) Convolutional neural network model compression method based on packet channel fusion
CN113052264A (en) Method for compressing target detection neural network
CN115964347B (en) Intelligent storage method for data of market supervision and monitoring center
CN116908684A (en) Motor fault prediction method and device, electronic equipment and storage medium
CN114545066A (en) Non-invasive load monitoring model polymerization method and system
CN113392593B (en) Converter transformer temperature field digital twin model construction method
Rahman et al. A Feature-Driven Fixed-Ratio Lossy Compression Framework for Real-World Scientific Datasets
US20220076122A1 (en) Arithmetic apparatus and arithmetic method
JP7404734B2 (en) Data compression device, history information management system, data compression method and data compression program
Shah et al. Gpu-accelerated error-bounded compression framework for quantum circuit simulations
CN113708772A (en) Huffman coding method, system, device and readable storage medium
CN112673576B (en) Data compression method, data recovery method and device
CN113255927A (en) Logistic regression model training method and device, computer equipment and storage medium
Liu et al. Understanding Effectiveness of Multi-Error-Bounded Lossy Compression for Preserving Ranges of Interest in Scientific Analysis
CN117312255B (en) Electronic document splitting optimization management method and system
EP3269042B1 (en) Data reduction method and apparatus
Vox et al. Integer Time Series Compression for Holistic Data Analytics in the Context of Vehicle Sensor Data
CN117176176B (en) Data analysis processing method based on big data
Ren et al. A Prediction-Traversal Approach for Compressing Scientific Data on Unstructured Meshes with Bounded Error
Li et al. Machine Learning Techniques for Data Reduction of Climate Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method and device for compressing and storing timing data of industrial equipment

Effective date of registration: 20231108

Granted publication date: 20221104

Pledgee: Guotou Taikang Trust Co.,Ltd.

Pledgor: Zhejiang Mulian Internet of things Technology Co.,Ltd.

Registration number: Y2023980064454