CN102427369A - Real-time holographic lossless compression method for productive time sequence data - Google Patents

Real-time holographic lossless compression method for productive time sequence data Download PDF

Info

Publication number
CN102427369A
CN102427369A CN2011103178943A CN201110317894A CN102427369A CN 102427369 A CN102427369 A CN 102427369A CN 2011103178943 A CN2011103178943 A CN 2011103178943A CN 201110317894 A CN201110317894 A CN 201110317894A CN 102427369 A CN102427369 A CN 102427369A
Authority
CN
China
Prior art keywords
data
quality
difference
time
time tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103178943A
Other languages
Chinese (zh)
Other versions
CN102427369B (en
Inventor
周伊琳
陈炯聪
黄缙华
孙建伟
陈扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Guangdong Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Guangdong Power Grid Co Ltd filed Critical Electric Power Research Institute of Guangdong Power Grid Co Ltd
Priority to CN201110317894.3A priority Critical patent/CN102427369B/en
Publication of CN102427369A publication Critical patent/CN102427369A/en
Application granted granted Critical
Publication of CN102427369B publication Critical patent/CN102427369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a real-time holographic lossless compression method for productive time sequence data, which comprises the steps of: respectively independently compressing three numerical ranges of each data in N productive time sequence data: a time label, a data value and data quality; respectively forming time label compression data, data value compression data and data quality compression data; and combining the three compression data into a complete compression data. In the method, productive time sequence data and files of each industry can be efficiently compressed in a lossless mode; and urgent demands of industries with enormous productive data, such as basic industry, electrics, telecommunication, chemical engineering, steel and the like for transmission, distribution, computing processing and storage of time sequence data can be satisfied.

Description

The holographic real-time lossless compression method of type of production time series data
Technical field
The present invention relates to a kind of data compression method, is a kind of holographic real-time lossless compression method that the type of production time series data is carried out specifically.
Background technology
At present, from computer hardware and software to the industrial control technology field all in develop rapidly, the multi-core technology of computer realm, multinode high-speed physical memory techniques have all become mature and stable concurrent technique and have supported.At industrial control field, power industry especially, because the control and the degree that becomes more meticulous used constantly promote, people have brought up to new height to the application demand of type of production time series data: the sampling of type of production time series data was accurate to for 100 frame/seconds; With analyze the relevant time series data of operation and all require online storage 5 years usually with first-class.Owing to accumulated quite a large amount of real time datas in the precision of individual data height, the running for many years; If use these data of memory device, stores merely; Memory device that needs labor and machine room are used to deposit these equipment; In addition, these data not only need storage safely and effectively, also need in the production practices in later stage, extract at any time and visit.And modern production control field has all proposed very high requirement to scale and the response speed aspect of data, we can say that this all is an individual great challenge concerning the operator.For satisfying the demand, have to expend quite a large amount of production and operation costs to achieve the goal.Especially at the compression memory technical elements of data, existing data processing method all can't directly apply on the type of production time series data in above-mentioned field.
Processing to the type of production time series data has two kinds of strategies traditionally: the compression of (1) file-level; Be similar to WinZip etc., the reason that this compact model can not solve the type of production time series data well comprises: the compression real-time is poor, the compression period operand is huge and return and to need the whole file of decompress(ion) when getting.Simultaneously, the specific aim of this type compress technique when process for producing type real time data is not strong, and compression ratio is low.(2) revolving door lossy compression method promptly adopts certain lf rule, abandons a part and changes less data, to filter the order ground that reaches compression; The implementation of revolving door lossy compression method is following:
Swinging door compression algorithm (SDT) is a kind of trends of straight line compression algorithm, and its essence is to replace a series of consecutive numbers strong point through a straight line of being confirmed by starting point and terminal point.This algorithm need write down every section time interval length, start point data and endpoint data, notices that the endpoint data of the last period is the start point data of next section.Its basic principle is comparatively simple, sees the schematic diagram shown in the accompanying drawing 1: first data point a respectively has a bit up and down, and the distance between they and a point is E, and these two points are as two fulcrums of " door ".When having only first data point, two fan doors are all closed; More and more along with counting, goalkeeper progressively opens; The width of noticing every fan door can stretch, and in the inside, a period of time interval, door is in case just open and can not close; As long as two fan Men Wei reach parallel, two interior angle sums are less than 180 ° in other words, and this " revolving door " operation can be proceeded.First time period is from a to e among the figure, the result be with a point to the straight line surrogate data method point between the e point (a, b, c, d, e); Second time interval, two fan doors were closed during beginning, progressively open then from the beginning of e point, and subsequent operation and the last period are similar.
Swinging door compression algorithm (SDT) during owing to its packed data, can abandon a part of data though real-time is stronger, therefore can not satisfy the demand to type of production time series data needs lossless compress.
Summary of the invention
The object of the present invention is to provide a kind of holographic real-time lossless compression method of type of production time series data; This method can carry out efficiently the type of production time series data and the file of all trades and professions, nondestructively compression, can satisfy in the huge industry of creation datas such as basic industries, electric power, telecommunications, chemical industry, iron and steel the active demand to sequential property transfer of data, distribution, calculating processing and storage.
The object of the invention can be realized through following technical measures:
A kind of holographic real-time lossless compression method of type of production time series data; It is characterized in that: the three number codomains that to sequence number are each data in N the type of production time series data of 1~N: time tag, value data and the quality of data are carried out independent compression respectively, and formation time label packed data, value data packed data and quality of data packed data respectively; Again three part packed datas are merged into a complete packed data.
Wherein, the compression process to said time tag is:
1a), first time tag is recorded in the time tag packed data, the difference of calculating preceding two time tags is as predicted time label difference, and records in the time tag packed data;
1b), since the 3rd time tag; Calculate the time tag difference between current time label and its previous time tag successively; And current time label difference contrasted with predicted time label difference: if both equate; Then the current time label is regular time tag, and, do not handle the current time label; Otherwise the current time label is non-regular time tag, and current non-regular time tag and sequence number thereof are recorded in the time tag packed data;
1c), repeating step 1b) up to handling whole N time tag;
Compression process to said value data is:
2a), first value data is recorded in the value data packed data;
2b), calculate before the difference between the consecutive value in twos in K+1 data numerical value, obtain K prediction data numerical value difference altogether, be designated as: Δ V 0..., Δ V K-1, and record in the value data packed data; The sequence number of said prediction data numerical value difference is 0~k-1;
2c), since K+2 data numerical value; Packed data head to a fixed bit of each value data record; And the value data difference of calculating current data numerical value value data previous: if current data numerical value difference equals one in the prediction data numerical value difference with it; Then the packed data head of this value data is designated as 0, the sequence number of the prediction data numerical value difference that then record is corresponding again; Otherwise, find out in the prediction data numerical value difference and the immediate prediction difference Δ V of current data numerical value difference j, with previous value data+Δ V jValue and current data numerical value carry out XOR, and, write down the sequence number j of immediate prediction data numerical value difference, the low 32-n bit data value of current data numerical value again with the number n that begins the continuous phase coordination in the packed data head record operation result from highest order;
Compression process to the said quality of data is:
3a), first quality of data is recorded in the quality of data packed data;
3b), calculate before the difference between the consecutive value in twos in K+1 data quality, as prediction data quality difference, again with K prediction data value record of poor quality in final data quality packed data;
3c), since K+2 data quality; Calculate the quality of data difference of current data quality and its previous quality of data; If current difference equals certain the Δ i in preceding K the difference; Then the current data quality is the regular data quality, and the sequence number i of this prediction data quality difference is write down among the interim packed data A; Otherwise the current data quality is non-regular data quality, and the sequence number and the current quality of data difference of current data quality recorded among the interim packed data B;
3d), repeating step 3c), and write down the number of non-regular data quality, after handling whole N data quality, the number of non-regular data quality, interim packed data A, interim packed data B are spliced to whole quality of data packed data back in proper order.
Said time tag compression process also comprises the compression process to the sequence number of time tag; Add up in the process specifically: at repeating step 1b) and write down the number of non-regular time tag; Calculate the needed total bytes of non-regular time tag sequence number of the current accumulation of storage according to this number; If said total bytes surpasses N position (bit); Then adopt the bit field mode of N position to express the sequence number of whole N time tag, and this expression of results is recorded in the time tag packed data, delete the record of the non-regular time tag sequence number that has write down in the said time tag packed data simultaneously.
The method that said bit field mode with the N position is expressed the sequence number of whole N time tag is: use each time tag corresponding to the relevant position in the binary value of N position; Use the time tag of 0 this pairing time tag of expression as rule, using the pairing time tag in 1 this position of expression is non-regular time tag.
The compression process that also comprises further non-regular time tag in the compression process of said time tag, specifically:
1i), at repeating step 1b) handle in the process of all N time tag, find out maximum and minimum value in the non-regular time tag, and the sequence number of maximum and minimum value, and record in the time tag packed data;
1ii), calculate the difference T of maximum and minimum value in the non-regular time tag, and form integer continuum [0, T];
1iii), in the time tag packed data that has write down; Begin from first non-regular time tag that has write down; Position in interval [0, T] of the difference of current non-regular time tag and minimum value, the sequence number of current non-regular time tag are recorded in the time tag packed data; Delete the non-regular time tag and the sequence number thereof that have write down in the said time tag packed data simultaneously.
Said quality of data compression process also comprises the compression process to the sequence number of the non-regular data quality among the interim packed data B; Specifically: if the byte number of the sequence number of the non-regular data quality of said bulk registration surpasses N position (bit); Then adopt the bit field mode of N position to express the sequence number of whole N data quality, delete the sequence number of the non-regular data quality that has write down among the said interim packed data B simultaneously.
The method that said bit field mode with the N position is expressed the sequence number of whole N data quality is: use each quality of data corresponding to the relevant position in the binary value of N position; Use the quality of data of 0 this pairing quality of data of expression as rule, using the pairing quality of data in 1 this position of expression is non-regular data quality.
Said to also comprising second-compressed process in the compression process of data quality, specifically to interim packed data B:
3i), at repeating step 3c) handle in the process of all N data quality, find out maximum and minimum value in the quality of data difference, and the sequence number of the maximum and the pairing quality of data of minimum value, and record in the quality of data packed data;
3ii), the maximum of calculated data quality difference and the difference L of minimum value, and form integer continuum [0, L];
3iii), the non-regular data quality difference that write down of first from interim packed data B begins; Calculate the difference T between the minimum value of current data quality difference and said quality of data difference, and the record of the non-regular data quality difference that said difference T is replaced originally write down among the packed data B temporarily
Said K is 3 or 5.
The sequence number of said current time label or the value data or the quality of data adopts the short of 2 bytes to carry out record.Said time tag is that equal difference increases progressively or the long of the long of random 4 bytes that increase progressively or 4 bytes adds the integer data of the millisecond precision of 2 bytes.
Said value data is the data of single precision 32bits float (32 floating-points) or double precision 64bits double (64 floating-points) type.
The said quality of data is 4 bytes of the status indicator value of the current type of production time series data of expression or the integer of 8 bytes.
Include the process of global optimization in said time tag compression process, value data compression process and the quality of data compression process, specifically: if packed data more than or equal to the size of initial data, so directly writes down initial data.
The present invention contrasts prior art, and following advantage is arranged:
1, the inventive method can the type of production time series data and the file of all trades and professions be carried out efficiently, nondestructively compression, can satisfy in the huge industry of creation datas such as basic industries, electric power, telecommunications, chemical industry, iron and steel active demand to sequential property transfer of data, distribution, calculating processing and storage; This compression method can compress or decompress(ion) any data that satisfy the real-time characteristic of typical type of production time series data, has very strong specific aim, and can not cause any loss of significance to creation data;
2, in this compression method that difference map to a continuous integer of integer numerical value is interval, thus compression ratio can further be promoted.
Description of drawings
Fig. 1 is the schematic flow sheet of the holographic real-time lossless compression method of type of production time series data of the present invention;
Fig. 2 is to the compression process flow chart of time tag in the holographic real-time lossless compression method of type of production time series data shown in Figure 1;
Fig. 3 is to the compression process flow chart of data numerical value in the holographic real-time lossless compression method of type of production time series data shown in Figure 1;
Fig. 4 is to the compression process flow chart of data quality in the holographic real-time lossless compression method of type of production time series data shown in Figure 1.
Embodiment
Real a series of type of production time series data time tag all is an increment value, adopts the fixing sampling interval, and the data that sampling obtains are exactly the initial data that needs compression, so initial data also all increases progressively.Sampling period is generally fixed in addition, promptly obtains the initial data of same data volume at every turn.Following handling process is generally compressed to per 1000 initial data.Since between the adjacent initial data variation very little, so most applications all can produce a continuum that span is very little, so the present invention adopts the compression processing method of following embodiment.
As shown in Figure 1; The holographic real-time lossless compression method of this type of production time series data is through compressing respectively the time tag in the type of production time series data, value data and the quality of data three partial datas, and formation time label packed data, value data packed data and quality of data packed data respectively.
Wherein, shown in Fig. 2 flow chart, the compression process of time tag is:
1a), first time tag is recorded in the time tag packed data, the difference of calculating preceding two time tags is as predicted time label difference, and records in the time tag packed data::
Comprise N time tag sequentially in N the type of production time series data, sequence number is 1 to N; Calculate the difference of first time tag and second time tag---predicted time label difference DELTA t 1, and with first time tag and time tag difference DELTA t 1Record in the time tag packed data;
1b), since the 3rd time tag, calculate the time tag difference between current time label and its previous time tag---time tag difference DELTA t successively i, and with current time label difference DELTA t iWith predicted time label difference DELTA t 1Contrast: if both equate that then the current time label is regular time tag, and, do not handle the current time label; Otherwise the current time label is non-regular time tag, and current non-regular time tag and sequence number thereof are recorded in the time tag packed data:
1c), repeating step 1b) up to handling all N time tag.
N is exactly an original data volume, and concrete normally big more effect is good more how much according to application demand, but can not be unrestrictedly big in Real Time Compression, generally is 1000.For these 1000 initial data, if all data all are the constant sampling periods, the size of each bar time tag is 6 bytes, wherein comprises the time second number of 4 bytes and the time millisecond number of 2 bytes; Article 1000, the total bytes of original time label data then is 6*1000=6000; According to above-mentioned reduced rule; Only need storage: 6 bytes of article one record, second record and 4 bytes of difference of article one record, 2 bytes of number of non-regular data; Promptly come to 12 bytes, compression ratio is promoted to so: 6000/12=500 doubly.
The time tag compression process also comprises the compression process to the sequence number of time tag; Specifically: if the byte number of the non-regular time tag sequence number of bulk registration surpasses N position (bit); Then adopt the mode of the bit field of N position to express the sequence number of whole N time tag: to use each the time tag in the binary value of N position corresponding to the relevant position; Use the time tag of 0 this pairing time tag of expression as rule, using the pairing time tag in 1 this position of expression is non-regular time tag.In the concrete operations, the expression of results of this N position bit field is recorded on the 11st~20 of time tag packed data.And this expression of results recorded in the time tag packed data, delete the record of the non-regular time tag sequence number that has write down in the time tag packed data simultaneously.N=1000 for example, the storage of wherein any sequence number needs 2 bytes, if EC more than or equal to 63, promptly needs the 63*2*8=1008 position, the sequence number (position) that adopts 1000 bit field can store irregular time tag so.
Bit field is a kind of data structure in the C language: so-called " bit field " is to be divided into several different zones to the binary bit in the byte, and the figure place that each is regional is described.There is a domain name in each territory, allows in program, to operate by domain name.So just can represent several different objects with the binary system bit field of a byte.
Next, further, in the compression process of time tag, also comprise the compression process of further non-regular time tag, specifically:
1d), at repeating step 1b) handle in the process of all N time tag, find out maximum and minimum value in the non-regular time tag, and the sequence number of maximum and minimum value, and record in the time tag packed data; The difference that can calculate storage maximum and minimum value needs several bytes at most, and (scope of 1 bytes store is that the scope of 0-255,2 bytes store is 0-65535 ...).
1e), calculate the difference T of maximum and minimum value in the non-regular time tag, and form integer continuum [0, T];
1f), in the time tag packed data that has write down; Begin from first non-regular time tag that has write down; Position in interval [0, T] of the difference of current non-regular time tag and minimum value, the sequence number of current non-regular time tag are recorded in the time tag packed data; Delete the non-regular time tag and the sequence number thereof that have write down in the said time tag packed data simultaneously.
The main benefit of the compression process of above-mentioned non-regular time tag is embodied in: suppose to occur in the current data a large amount of non-regular time tags, non-regular time tag of every storage needs 6 bytes; If adopt above-mentioned compression process this moment, for example: the maximum in the non-regular time tag is 145, minimum value is 126, and difference T maximum, minimum value is 19, has formed an integer continuum (0,19) so.After adopting this strategy, for any one non-regular time tag, we only need the difference between record itself and the minimum value, and this difference one fixes in this continuous integer interval.Such as an initial data 130; Initial data needs 4 bytes to store; Only need its Position Number 130-126=4 in the integer continuum of storage to get final product now, and any one integer value in 1~19 only needs the 3bit binary expression, therefore storing metric 4 just wants 3Bits.Under this scene, the ratio of compression is (4*8)/3=~10 times, promptly approximates 10 times greatly, removes other inner other added burden, and the data compression ratio of non-regular time tag can be stabilized in more than 9 times, and is considerable.
Shown in Fig. 3 flow chart, be to the compression process of data numerical value:
2a), first value data is recorded in the value data packed data;
2b), calculate before the difference between the consecutive value in twos in K+1 data numerical value, obtain K prediction data numerical value difference altogether, and record in the value data packed data;
2c), since K+2 data numerical value; Each value data is write down a fixedly packed data head of figure place; And the value data difference of calculating current data numerical value value data previous: if current data numerical value difference equals one in the prediction data numerical value difference with it; Then the packed data head of this value data is designated as 0, the sequence number of the prediction data numerical value difference that then record is corresponding again; Otherwise; Find out in the prediction data numerical value difference and the immediate prediction difference Δ Vj of current data numerical value difference; Value and the current data numerical value of previous value data+Δ Vj are carried out XOR; And, write down the sequence number j of immediate prediction data numerical value difference, the low 32-n bit data value of current data numerical value again with the number n that begins the continuous phase coordination in the packed data head record operation result from highest order;
Because original value data is 32 floating numbers, the result behind the XOR then is 32 altogether,, 32 identical bits is arranged at most that is, gets final product with 6 numbers that write down these identical bits.Therefore, when no matter the record rule data also are non-regular data, all regularly earlier among the packed data head record XOR result of one 6 of records highest order begin the number n of continuous phase coordination; Wherein, for regular data, the result of XOR is 32 and is 0; Therefore; The value of these 6 packed data heads is 32, then 2 record j (K=3 is if need 3 LSN j during K=5); For non-regular data, followed by this fixedly the packed data head be exactly sequence number j, the XOR result's of immediate prediction data difference the occurrence of the pairing value data in low 32-n position at the back.
Wherein, the low 32-n position of current data numerical value adopts the mode of bit stream (bits stream) to record in the value data packed data.Final all continuous floating number value data will be compressed together with the mode of bits stream, and whole process is all passed through bit arithmetic realization efficiently.
Because value data is 32 floating numbers, then the n value is necessarily less than 32, and therefore, storage n only needs 6bit; Sequence number j is the value less than 3, and storage j only needs 2bit, in addition, because the characteristic of time series data; The similarity of the value data of adjacent two time series datas is very high, and for example: continuous two samples 1129.32 and 1129.51 respectively are 32; With both with the position XOR after, the high 19bits that can calculate two samples is duplicate, difference partly is low 13bits.The present invention has introduced the difference prediction of the individual sample of preceding K (K=3 or 5); If so previous value data is added in the prediction difference; Will have higher similitude with current data numerical value; That is, the result behind the XOR obtains the quantity of coordination not necessarily smaller or equal to 13bit, and then storing current data numerical value only need be smaller or equal to the 2+6+13=21 position.This compression method is very efficient to the compression that saltus step data, sawtooth waveform and periodicity repeat delta data.Shown in Fig. 4 flow chart, be to the compression process of data quality:
3a), first quality of data is recorded in the quality of data packed data;
3b), calculate before the difference between the consecutive value in twos in K+1 data quality, as prediction data quality difference, again with K prediction data value record of poor quality in final data quality packed data; The value of K is generally 3 or 5; In practical application; The quality of data of a type of production time series data generally is that saltus step is less; Therefore generally just can play good compression effectiveness through a preceding K difference; And a Δ i numbering only needs 2bits just can express (needing 3bits during K=5), has obtained very outstanding effect in the practice;
3c), since K+2 data quality; Calculate the quality of data difference of current data quality and its previous quality of data; If current difference equals certain the Δ i in preceding K the difference; Then the current data quality is the regular data quality, and the sequence number i of this prediction data quality difference is write down among the interim packed data A; Otherwise the current data quality is non-regular data quality, and the sequence number and the current quality of data difference of current data quality recorded among the interim packed data B;
3d), repeating step 3c), and write down the number of non-regular data quality, after handling whole N data quality, the number of non-regular data quality, interim packed data A, interim packed data B are spliced to whole quality of data packed data back in proper order.
The sequence number of each prediction difference writes down with 2; At 3c) step in write down the number of non-regular data quality; Quality of data packed data part so; And then behind first quality of data record, be " sequence number+difference " of the data record of individual continuous 2 " sequence number of prediction difference " of number, (number of the non-regular data quality of N-) of non-regular data quality, rule.In brief, " sequence number of prediction difference " of all regular data is stored in together, and " sequence number+difference " of all non-regular data is following closely, just can distinguish both simply according to " number of non-regular data quality ".In implementation procedure, adopt the information of rule and non-regular data to be recorded in earlier in the temporary variable, handle at last and just carry out sets of copies after N the data and synthesize final packed data.
Quality of data compression process also comprises the compression process to the sequence number of the non-regular data quality among the interim packed data B; Specifically: if the byte number of the sequence number of the non-regular data quality of said bulk registration surpasses the N position; Then adopt the bit field mode of N position to express the sequence number of whole N data quality, delete the sequence number of the non-regular data quality that has write down among the said interim packed data B simultaneously.The method of expressing the sequence number of whole N data quality with the bit field mode of N position is: use each quality of data corresponding to the relevant position in the binary value of N position; Use the quality of data of 0 this pairing quality of data of expression as rule, using the pairing quality of data in 1 this position of expression is non-regular data quality.
Next, further, to also comprising second-compressed process in the compression process of data quality, specifically to interim packed data B:
3e), at repeating step 3c) handle in the process of all N data quality, find out maximum and minimum value in the quality of data difference, and the sequence number of the maximum and the pairing quality of data of minimum value, and record in the quality of data packed data;
3f), the maximum of calculated data quality difference and the difference L of minimum value, and form integer continuum [0, L];
3g), the non-regular data quality difference that write down of first from interim packed data B begins; Calculate the difference T between the minimum value of current data quality difference and said quality of data difference, and the record of the non-regular data quality difference that said difference T is replaced originally write down among the packed data B temporarily.In all compression steps, the sequence number of time tag, value data, the quality of data all adopts the short of 2 bytes to carry out record.
Time tag is that equal difference increases progressively or the long of the long of random 4 bytes that increase progressively or 4 bytes adds the integer data of the millisecond precision of 2 bytes.
Value data is IEEE 754 floating numbers of engineering numerical value, and is the data of single precision 32bits float (32 floating-points) or double precision 64bits double (64 floating-points) type.
The quality of data is the integer of 4 bytes or 8 bytes of the status indicator value of current type of production time series data.Be used to for example show and go beyond the scope, report to the police and other self-defined implications.
Include the process of global optimization in time tag compression process, value data compression process and the quality of data compression process, specifically: if packed data more than or equal to the size of initial data, so directly writes down initial data.The storage initial data is in order to promote the efficient of decompress(ion) under the packed data situation identical with initial data.Use this method that the type of production time series data is carried out compression verification, test environment is: CPU:Intel (R) Core (TM) i7-2620M CPU2.7GHz; RAW:4G, test result is: the creation data of internal memory compression 1G is consuming time to be 1,512ms; Compare with existing compression method; Save compression time greatly, simultaneously, because this method adopts harmless compression method; The characteristic that has kept total data in the compression process can not cause any loss of significance to creation data.With the compression of this method practice in electrical network high speed PMU data, the result proves that the compression ratio of this method is existing more than 3 times of lossy compression, and compression, decompress(ion) performance are more than 2 times of other realization simultaneously.
In practical application, adopted compression method of the present invention among the homemade real-time dataBase system PTimeDB of high speed, are more than 3 times of lossy compression to the compression ratio of electrical network high speed PMU data, compression, decompress(ion) performance are more than 2 times of other realization simultaneously.
Execution mode of the present invention is not limited thereto; Under the above-mentioned basic fundamental thought of the present invention prerequisite;, all drop within the rights protection scope of the present invention modification, replacement or the change of other various ways that content of the present invention is made according to the ordinary skill knowledge of this area and customary means.

Claims (10)

1. the holographic real-time lossless compression method of a type of production time series data; It is characterized in that: the three number codomains that to sequence number are each data in N the type of production time series data of 1~N: time tag, value data and the quality of data are carried out independent compression respectively, and formation time label packed data, value data packed data and quality of data packed data respectively; Again three part packed datas are merged into a complete packed data;
Wherein, the compression process to said time tag is:
1a), first time tag is recorded in the time tag packed data, the difference of calculating preceding two time tags is as predicted time label difference, and records in the time tag packed data;
1b), since the 3rd time tag; Calculate the time tag difference between current time label and its previous time tag successively; And current time label difference contrasted with predicted time label difference: if both equate; Then the current time label is regular time tag, and, do not handle the current time label; Otherwise the current time label is non-regular time tag, and current non-regular time tag and sequence number thereof are recorded in the time tag packed data;
1c), repeating step 1b) up to handling whole N time tag;
Compression process to said value data is:
2a), first value data is recorded in the value data packed data;
2b), calculate before the difference between the consecutive value in twos in K+1 data numerical value, obtain K prediction data numerical value difference altogether, and record in the value data packed data;
2c), since K+2 data numerical value; Each value data is write down a fixedly packed data head of figure place; And the value data difference of calculating current data numerical value value data previous: if current data numerical value difference equals one in the prediction data numerical value difference with it; Then the packed data head of this value data is designated as 0, the sequence number of the prediction data numerical value difference that then record is corresponding again; Otherwise; Find out in the prediction data numerical value difference and the immediate prediction difference Δ Vj of current data numerical value difference; Value and the current data numerical value of previous value data+Δ Vj are carried out XOR; And, write down the sequence number j of immediate prediction data numerical value difference, the low 32-n bit data value of current data numerical value again with the number n that begins the continuous phase coordination in the packed data head record operation result from highest order;
Compression process to the said quality of data is:
3a), first quality of data is recorded in the quality of data packed data;
3b), calculate before the difference between the consecutive value in twos in K+1 data quality, as prediction data quality difference, again with K prediction data value record of poor quality in final data quality packed data;
3c), since K+2 data quality; Calculate the quality of data difference of current data quality and its previous quality of data; If current difference equals certain the Δ i in preceding K the difference; Then the current data quality is the regular data quality, and the sequence number i of this prediction data quality difference is write down among the interim packed data A; Otherwise the current data quality is non-regular data quality, and the sequence number and the current quality of data difference of current data quality recorded among the interim packed data B;
3d), repeating step 3c), and write down the number of non-regular data quality, after handling whole N data quality, the number of non-regular data quality, interim packed data A, interim packed data B are spliced to whole quality of data packed data back in proper order.
2. the holographic real-time lossless compression method of type of production time series data according to claim 1; It is characterized in that: said time tag compression process also comprises the compression process to the sequence number of time tag; Add up in the process specifically: at repeating step 1b) and write down the number of non-regular time tag; Calculate the needed total bytes of non-regular time tag sequence number of the current accumulation of storage according to this number; If said total bytes surpasses the N position; Then adopt the bit field mode of N position to express the sequence number of whole N time tag, and this expression of results is recorded in the time tag packed data, delete the record of the non-regular time tag sequence number that has write down in the said time tag packed data simultaneously.
3. the holographic real-time lossless compression method of type of production time series data according to claim 2; It is characterized in that: the method that said bit field mode with the N position is expressed the sequence number of whole N time tag is: use each time tag corresponding to the relevant position in the binary value of N position; Use the time tag of 0 this pairing time tag of expression as rule, using the pairing time tag in 1 this position of expression is non-regular time tag.
4. the holographic real-time lossless compression method of type of production time series data according to claim 1 and 2 is characterized in that: also comprise the compression process of further non-regular time tag in the compression process of said time tag, specifically:
1i), at repeating step 1b) handle in the process of all N time tag, find out maximum and minimum value in the non-regular time tag, and the sequence number of maximum and minimum value, and record in the time tag packed data;
1ii), calculate the difference T of maximum and minimum value in the non-regular time tag, and form integer continuum [0, T];
1iii), in the time tag packed data that has write down; Begin from first non-regular time tag that has write down; Position in interval [0, T] of the difference of current non-regular time tag and minimum value, the sequence number of current non-regular time tag are recorded in the time tag packed data; Delete the non-regular time tag and the sequence number thereof that have write down in the said time tag packed data simultaneously.
5. the holographic real-time lossless compression method of type of production time series data according to claim 1; It is characterized in that: said quality of data compression process also comprises the compression process to the sequence number of the non-regular data quality among the interim packed data B; Specifically: if the byte number of the sequence number of the non-regular data quality of said bulk registration surpasses the N position; Then adopt the bit field mode of N position to express the sequence number of whole N data quality, delete the sequence number of the non-regular data quality that has write down among the said interim packed data B simultaneously.
6. the holographic real-time lossless compression method of type of production time series data according to claim 5; It is characterized in that: the method that said bit field mode with the N position is expressed the sequence number of whole N data quality is: use each quality of data corresponding to the relevant position in the binary value of N position; Use the quality of data of 0 this pairing quality of data of expression as rule, using the pairing quality of data in 1 this position of expression is non-regular data quality.
7. according to the holographic real-time lossless compression method of claim 1 or 5 described type of production time series datas, it is characterized in that: said to also comprising second-compressed process in the compression process of data quality, specifically to interim packed data B:
3i), at repeating step 3c) handle in the process of all N data quality, find out maximum and minimum value in the quality of data difference, and the sequence number of the maximum and the pairing quality of data of minimum value, and record in the quality of data packed data;
3ii), the maximum of calculated data quality difference and the difference L of minimum value, and form integer continuum [0, L];
3iii), the non-regular data quality difference that write down of first from interim packed data B begins; Calculate the difference T between the minimum value of current data quality difference and said quality of data difference, and the record of the non-regular data quality difference that said difference T is replaced originally write down among the packed data B temporarily.
8. the holographic real-time lossless compression method of type of production time series data according to claim 1 is characterized in that: the sequence number of said current time label or the value data or the quality of data adopts the short of 2 bytes to carry out record.
9. the holographic real-time lossless compression method of type of production time series data according to claim 1 is characterized in that: said time tag is that equal difference increases progressively or the long of the long of random 4 bytes that increase progressively or 4 bytes adds the integer data of the millisecond precision of 2 bytes; Said value data is the data of single precision or double; The said quality of data is 4 bytes of the status indicator value of the current type of production time series data of expression or the integer of 8 bytes.
10. the holographic real-time lossless compression method of type of production time series data according to claim 1; It is characterized in that: the process that includes global optimization in said time tag compression process, value data compression process and the quality of data compression process; Specifically: if packed data more than or equal to the size of initial data, so directly writes down initial data.
CN201110317894.3A 2011-10-19 2011-10-19 Real-time holographic lossless compression method for productive time sequence data Active CN102427369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110317894.3A CN102427369B (en) 2011-10-19 2011-10-19 Real-time holographic lossless compression method for productive time sequence data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110317894.3A CN102427369B (en) 2011-10-19 2011-10-19 Real-time holographic lossless compression method for productive time sequence data

Publications (2)

Publication Number Publication Date
CN102427369A true CN102427369A (en) 2012-04-25
CN102427369B CN102427369B (en) 2014-01-01

Family

ID=45961318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110317894.3A Active CN102427369B (en) 2011-10-19 2011-10-19 Real-time holographic lossless compression method for productive time sequence data

Country Status (1)

Country Link
CN (1) CN102427369B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932001A (en) * 2012-11-08 2013-02-13 大连民族学院 Method for compressing and decompressing motion capture data
CN104484476A (en) * 2014-12-31 2015-04-01 中国石油天然气股份有限公司 Method and device for compressing and storing indicator diagram data of oil pumping unit
CN104519525A (en) * 2013-09-30 2015-04-15 日月光半导体制造股份有限公司 Devices and methods for transmitting and receiving compressed packet
CN104734726A (en) * 2015-04-01 2015-06-24 东方电子股份有限公司 Time series data online compression method supporting editing
CN106055275A (en) * 2016-05-24 2016-10-26 深圳市敢为软件技术有限公司 Data compression recording method and apparatus
CN106372181A (en) * 2016-08-31 2017-02-01 东北大学 Big data compression method based on industrial process
CN106549672A (en) * 2016-10-31 2017-03-29 合肥移顺信息技术有限公司 A kind of three axle data compression methods of acceleration transducer
CN106877506A (en) * 2017-03-23 2017-06-20 佛山电力设计院有限公司 A kind of host-host protocol compression method of the out-of-limit Monitoring Data of distribution network voltage
CN108153483A (en) * 2016-12-06 2018-06-12 南京南瑞继保电气有限公司 A kind of time series data compression method based on attribute grouping
CN108981990A (en) * 2018-07-25 2018-12-11 中国石油天然气股份有限公司 Indicator
CN109246086A (en) * 2018-08-16 2019-01-18 上海海压特智能科技有限公司 The transfer approach of director data packet
CN109684328A (en) * 2018-12-11 2019-04-26 中国北方车辆研究所 A kind of Dimension Time Series compression and storage method
CN111064471A (en) * 2018-10-16 2020-04-24 阿里巴巴集团控股有限公司 Data processing method and device and electronic equipment
CN111966648A (en) * 2020-07-29 2020-11-20 国机智能技术研究院有限公司 Industrial data processing method and electronic equipment
CN112702340A (en) * 2020-12-23 2021-04-23 深圳供电局有限公司 Historical message compression method and system, computing device and storage medium
CN113242041A (en) * 2021-03-10 2021-08-10 湖南大学 Data hybrid compression method and system thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020141413A1 (en) * 2001-03-29 2002-10-03 Koninklijke Philips Electronics N.V. Data reduced data stream for transmitting a signal
CN101923569A (en) * 2010-07-09 2010-12-22 南京朗坤软件有限公司 Storage method of structure type data of real-time database

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020141413A1 (en) * 2001-03-29 2002-10-03 Koninklijke Philips Electronics N.V. Data reduced data stream for transmitting a signal
CN101923569A (en) * 2010-07-09 2010-12-22 南京朗坤软件有限公司 Storage method of structure type data of real-time database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐慧: "实时数据库中数据压缩算法的研究", 《中国优秀硕士论文电子期刊网》 *
黄文君等: "数据压缩技术在实时数据库中的应用研究", 《仪器仪表学报》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932001B (en) * 2012-11-08 2015-07-29 大连民族学院 Motion capture data compression, decompression method
CN102932001A (en) * 2012-11-08 2013-02-13 大连民族学院 Method for compressing and decompressing motion capture data
CN104519525A (en) * 2013-09-30 2015-04-15 日月光半导体制造股份有限公司 Devices and methods for transmitting and receiving compressed packet
CN104519525B (en) * 2013-09-30 2018-02-06 日月光半导体制造股份有限公司 Compress the dispensing device and reception device and its sending method and method of reseptance of package
CN104484476A (en) * 2014-12-31 2015-04-01 中国石油天然气股份有限公司 Method and device for compressing and storing indicator diagram data of oil pumping unit
CN104484476B (en) * 2014-12-31 2019-04-12 中国石油天然气股份有限公司 A kind of pumping-unit workdone graphic data compression storage method and device
CN104734726B (en) * 2015-04-01 2017-08-25 东方电子股份有限公司 A kind of time series data line compression method for supporting to edit
CN104734726A (en) * 2015-04-01 2015-06-24 东方电子股份有限公司 Time series data online compression method supporting editing
CN106055275A (en) * 2016-05-24 2016-10-26 深圳市敢为软件技术有限公司 Data compression recording method and apparatus
CN106372181B (en) * 2016-08-31 2019-08-06 东北大学 A kind of big data compression method based on industrial process
CN106372181A (en) * 2016-08-31 2017-02-01 东北大学 Big data compression method based on industrial process
CN106549672A (en) * 2016-10-31 2017-03-29 合肥移顺信息技术有限公司 A kind of three axle data compression methods of acceleration transducer
CN106549672B (en) * 2016-10-31 2019-07-12 合肥移顺信息技术有限公司 A kind of three axis data compression methods of acceleration transducer
CN108153483B (en) * 2016-12-06 2021-04-20 南京南瑞继保电气有限公司 Time sequence data compression method based on attribute grouping
CN108153483A (en) * 2016-12-06 2018-06-12 南京南瑞继保电气有限公司 A kind of time series data compression method based on attribute grouping
CN106877506A (en) * 2017-03-23 2017-06-20 佛山电力设计院有限公司 A kind of host-host protocol compression method of the out-of-limit Monitoring Data of distribution network voltage
CN106877506B (en) * 2017-03-23 2019-06-07 佛山电力设计院有限公司 A kind of transport protocol compression method of the out-of-limit monitoring data of distribution network voltage
CN108981990A (en) * 2018-07-25 2018-12-11 中国石油天然气股份有限公司 Indicator
CN109246086A (en) * 2018-08-16 2019-01-18 上海海压特智能科技有限公司 The transfer approach of director data packet
CN111064471A (en) * 2018-10-16 2020-04-24 阿里巴巴集团控股有限公司 Data processing method and device and electronic equipment
CN111064471B (en) * 2018-10-16 2023-04-11 阿里巴巴集团控股有限公司 Data processing method and device and electronic equipment
CN109684328A (en) * 2018-12-11 2019-04-26 中国北方车辆研究所 A kind of Dimension Time Series compression and storage method
CN111966648A (en) * 2020-07-29 2020-11-20 国机智能技术研究院有限公司 Industrial data processing method and electronic equipment
CN111966648B (en) * 2020-07-29 2023-09-08 国机智能科技有限公司 Industrial data processing method and electronic equipment
CN112702340A (en) * 2020-12-23 2021-04-23 深圳供电局有限公司 Historical message compression method and system, computing device and storage medium
CN113242041A (en) * 2021-03-10 2021-08-10 湖南大学 Data hybrid compression method and system thereof

Also Published As

Publication number Publication date
CN102427369B (en) 2014-01-01

Similar Documents

Publication Publication Date Title
CN102427369B (en) Real-time holographic lossless compression method for productive time sequence data
Zhu et al. Carbon price forecasting with variational mode decomposition and optimal combined model
Hui et al. A genetic algorithm for product disassembly sequence planning
Crudu et al. Hybrid stochastic simplifications for multiscale gene networks
CN104199942B (en) A kind of Hadoop platform time series data incremental calculation method and system
JP6980521B2 (en) Data metascaling device and method for continuous learning
WO2019076177A1 (en) Gene sequencing data compression preprocessing, compression and decompression method, system, and computer-readable medium
CN105825269A (en) Parallel autoencoder based feature learning method and system
Soleymani Some optimal iterative methods and their with memory variants
CN104869397A (en) Adaptive range coding method and decoding method based on SLWE probability
Pawlowski et al. Flow-based density of states for complex actions
CN108287985A (en) A kind of the DNA sequence dna compression method and system of GPU acceleration
Zhao et al. Application of a novel hybrid accumulation grey model to forecast total energy consumption of Southwest Provinces in China
Tong et al. Research on short-term traffic flow prediction based on the tensor decomposition algorithm
Xie et al. On discrete grey system forecasting model corresponding with polynomial time-vary sequence.
CN105469601A (en) A road traffic space data compression method based on LZW coding
Scholz et al. Latent Linear ODEs with Neural Kalman Filtering for Irregular Time Series Forecasting
CN114722704A (en) Wheel wear prediction network model training method and device
CN102436545B (en) Diversity analysis method based on chemical structure with CPU (Central Processing Unit) acceleration
CN109257047B (en) Data compression method and device
Liu et al. A high performance compression method for climate data
CN102801426B (en) Time sequence data fitting and compressing method
de Oliveira et al. Time Series Compression for IoT: A Systematic Literature Review
Qiu et al. Prediction method for regional logistics
Huang et al. TSTC: Enabling Efficient Training via Structured Sparse Tensor Compilation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP03 Change of name, title or address

Address after: 510080 Dongfeng East Road, Dongfeng, Guangdong, Guangzhou, Zhejiang Province, No. 8

Patentee after: Electric Power Research Institute of Guangdong Power Grid Co.,Ltd.

Address before: Guangzhou City, Guangdong province Yuexiu District 510080 Dongfeng East Road, No. 8 building water Kong Guangdong

Patentee before: ELECTRIC POWER RESEARCH INSTITUTE OF GUANGDONG POWER GRID Corp.

CP03 Change of name, title or address