CN101078931A

CN101078931A - Distributed type double real-time compression method and system

Info

Publication number: CN101078931A
Application number: CN 200710099979
Authority: CN
Inventors: 刘桂雄; 洪晓斌; 金军
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2007-06-01
Filing date: 2007-06-01
Publication date: 2007-11-28
Anticipated expiration: 2027-06-01
Also published as: CN100573385C

Abstract

The invention discloses a method and system based on PLSR_SBR distributed double deck real time compression, which is characterized by the following: defining concepts of main parameter, auxiliary parameter, compressing availability in IP mode testing system; adopting information decomposition and extracting strategy in the first layer; affirming main constituent with the compressing availability; building up a main parameter; proceeding data compress with the ascertain mode of the all auxiliary parameter; selecting base sequence in the second layer; building up base signal; displaying distributed data characteristic; dividing the original data sequence of the auxiliary parameter to a series of subsequence; proceeding compress; proceeding data communication with big net type of the system; dividing the net testing platform to three layers as on-site observe and control layer, firm grade monitoring layer and remote observe and control layer; collecting each parameter data with the IP mode intelligence observe and control device in the on-site observe and control layer; transmitting the data to real time data bank server of the plant monitoring layer; proceeding compressing treatment; collecting on-site information through the remote terminal software of the remote observe and control layer; making correct remote control for on-site device.

Description

A kind of distributed type double real-time compression method and system

Technical field

The present invention relates to a kind of distributed type double real-time compression method and system, relate in particular to measuring and control data double real-time compression method and system based on the collaborative TT﹠C system of IP pattern.

Background technology

Along with the further popularization that Industrial Ethernet is used in TT﹠C system, intrasystem measuring and controlling node increases considerably, and the introducing of data compression technique can increase the capacity of database greatly; Simultaneously, when carrying out data query, the feasible running time to hard disk of data compression technique shortens, and the decompress(ion) speed of data compression technique is fast, and the system that makes can increase the data time interval of inquiry under the situation that does not influence other operations.What is more important based on compress technique, can significantly reduce the data volume in the Ethernet measurement and control network at the scene, has avoided the collision problem of Industrial Ethernet well, has improved the reliability of system.

Make a general survey of domestic and international more common several rdal-time DBMSs, common data compression algorithm mainly contains two kinds: a kind of is representative with the ENDA database, adopts the Huffman compression algorithm; Another kind of is representative with the PI real-time data base, adopt swinging door compression algorithm, wherein the revolving door algorithm has higher ratio of compression than Huffman algorithm, but is applied to that weak point has three in the IP pattern real-time dataBase system: the 1. randomness of " transit data collection " amount of capacity." transit data collection " refers to all data points between previous data point that is retained and the new data point, these data points can not be saved but this revolving door compression implementation algorithm must be used, they are stored in the internal memory, as wanting " transit data collection " too big, the significant wastage that will cause internal memory influences the real-time of system; 2. along with the increase of " transit data collection ", judge whether not the time in " parallelogram " will become the bottleneck of real-time system with geometric growth to the interior data of transit data collection; 3. the compress technique that belongs to single node is not considered the bulk redundancy information that exists between the multinode data in the distributed system.

According to the above, the innovation of compression method need be studied in conjunction with the data characteristics of the collaborative TT﹠C system of IP pattern, and it has influence on the compressibility, accuracy rate, real-time, stability etc. of measuring and control data compression.

The real-time measuring and control data of the collaborative TT﹠C system of IP pattern has following several characteristics: 1. picking rate is fast; 2. validity is subjected to time restriction; 3. there is certain redundance in distributed data; 4. have multiple correlation between parameter, some parameter is subjected to the influence of other a plurality of parameters etc.So how to improve the compressibility and the accuracy of distributed measuring and control data, it is the very important research direction of real-time data base in the collaborative TT﹠C system of IP pattern that the coupling information between the searching many reference amounts is handled.

The domestic patent No. is that CN 200510115119.4 relates to a kind of real-time data compression method that is used for Process Control System, and described real time data comprises the numerical value of analog quantity, and this method comprises: 1) initialization dictionary; 2) read in numerical value; 3) adjacent data of real time data is subtracted each other obtain difference, in compressed file, preserve first numerical value that reads in; 4) adopt lzw algorithm that described difference is compressed.This invention has realized the lossless compress to real time data, does not relate to the lossy compression method of real time data.

The domestic patent No. is the data compression method based on cubic spline interpolation of CN00813613.0, be a kind of inverse transformation of conversion that is used for obtaining the conversion of compressed signal and calculates compressed signal to obtain the method and system of compressed signal, do not relate to the compression of distributed couplings information.

The domestic patent No. is that the adaptive historical data compression method of CN 02120383.0 is that compression is handled to the historical data in the database, the calculating and the compression verification that comprise compression time judgement, slope judge that three steps realize the self-adapting compressing of historical datas, belong to the lossy compression method method, do not relate to the compression of distributed couplings information.

The domestic patent No. is the method that is used for digital data compression of CN 98124625.7, and the method comprising the steps of: data value is resolved into a series of components with low frequency, high frequency and hybrid frequency component; Each the first coefficient amount of determining to have in the component of an amplitude surpasses the intended component threshold value; Described each component is resolved into a series of subcomponents with low frequency, high frequency and blend sub component; Each the second coefficient amount of determining to have in the subcomponent of an amplitude surpasses predetermined subcomponent threshold value; Determine whether described component should be broken down into described subcomponent; With when decomposing, abovementioned steps is applied to each described subcomponent till reaching described decomposition predetermined level.This method belongs to the wavelength-division of a kind of data value collected works and separates method, does not relate to the PLSR_SBR method.

Domestic also have some other mainly to the relevant patent that other information (as text, sound, image etc.) are compressed to be: as the data compression method of domestic patent CN03114423.3 based on information source high-order entropy, this invention is based on high-order (2 rank or the 3 rank) entropy of information source, can improve the ratio of compression of data significantly; Be adaptive algorithm, do not need to know in advance the frequency of occurrences of each information source; Be directly to set up non-prefix code according to the frequency of occurrences of information source, do not need to set up binary tree, be applicable to the compression and decompression of all digital files, also can be used as the entropy coding algorithm in the information lossy compression method methods such as image, sound, also can be used for the compression and decompression of various real time flow medium information, compress and The present invention be directed to numerical data.

The present invention is based on the collaborative TT﹠C system of IP pattern, and by the double-deck Real Time Compression of PLSR_SBR realization system distributed couplings information, its calculating is simpler, and real-time is good, the precision height.

Summary of the invention

In view of the present situation of above-mentioned prior art and the problem of existence, the purpose of this invention is to provide a kind of based on PLSR_SBR (PLSR:Partial Least-squares Regression, partial least-squares regression method; SBR:Self Based Regression, autoregressive method) distributed type double real-time compression method and system, emphasis solves the compressibility, compression accuracy, compression speed of real-time data base many reference amounts distributed data compression in the collaborative TT﹠C system of IP pattern, to key issues such as industrial environment applicabilities, its core is under the collaborative TT﹠C system structure of the IP pattern of determining, utilizes the mathematical method of PLSR_SBR many reference amounts distributed couplings information to be carried out the method for double-deck Real Time Compression.This method has characteristics such as high compression rate and good real-time.

The objective of the invention is to be achieved through the following technical solutions:

Determine main parameter and auxiliary parameter in the many reference amounts, based on the PLSR ultimate principle, the main umber of carry out the data normalization processing, determining according to compression validity carries out the composition extraction, PLSR regression model equation is determined three main steps greatly, determines the mathematical relation between one of them main parameter and other all auxiliary parameters.If any a plurality of main parameters, capable of circulation carrying out, thus realize the ground floor compression; The second layer by choosing basic sequence wherein, is set up the basis signal that embodies the distributed data feature at all remaining auxiliary parameters, the original data sequence of assisting parameter is divided into serial subsequence compresses.Subsequence number in the wherein basic sequence is determined, adopt binary search, the prerequisite constraint condition of the global error that obtains less than the match of the basic sequence of preceding Ins-1 or Ins+1 institute based on the global error of the remaining whole original data sequence of preceding Ins basic sequence match draws.The original data sequence of the auxiliary parameter of wherein each based on least square method and progressively displacement mode begin to carry out match from basic sequence high order end, match once can obtain a five elements array (row, shift, a1, b1, error1).Original data sequence moves one to basis signal the right then, can obtain another one five elements array (row again, shift, a2, b2, error2), original data sequence is shifted according to this, till original series can not be shifted, by comparing the error in each five elements array, minimum pairing that five elements array of error was promptly represented this original data sequence.Error of fitting error according to each sequence correspondence carries out descending sort in whole original data sequence then.It is two subsequences that pairing that original data sequence of first error is divided equally, and then is established to the mapping of basis signal respectively, and whole afterwards new data sequence basis error of fitting error separately carries out descending sort again.Under the prerequisite of the total error ε of system, in order to reduce the number of final subsequence, improve compressibility, if do not satisfy under the situation less than total error ε, cycle calculations successively, the length of the subsequence after being decomposed is less than till 6.If still do not satisfy condition,, select for use the subsequence of benefit value maximum in the basic for the previous period sequence to replace promptly based on use amount and cumulative errors comprehensive evaluation principle less than total error ε.

System of the present invention comprises:

System carries out data communication based on the Ethernet mode, wholely is divided into tri-layer based on TCP/IP Industrial Ethernet monitoring platform (abbreviating IP pattern monitoring platform as),

Ground floor---on-the-spot observing and controlling layer, comprise after the IP pattern intelligent monitoring equipment that is distributed in on-the-spot each position is handled the environmental parameter signals that collects in real time data are sent on the Ethernet, on-the-spot microcomputer is based on the Delphi technology, concentrate coordinated management, and, promote on-the-spot controlled plant group and implement field control by the control strategy that IP pattern fuzzy control device reception host computer sends;

The second layer---enterprise-level supervisory layers based on JAVA, virtual instrument technology, comprises tactful software for editing, monitor terminal software and three independent modules of carrying out of real-time kernel software.But at tactful software for editing editing graph observing and controlling strategy, the observing and controlling strategy becomes serial micro-order through compiling, sends to on-the-spot IP pattern measure and control device through real-time kernel software and carries out; Monitor terminal software is obtained the video data of instrument face plate from real-time kernel software, generates graphic user interface, and the prerequisite of monitor terminal running software is that real-time kernel software normally moves; A plurality of monitor terminal softwares can be connected to a real-time kernel software simultaneously, realize the monitoring of multiple spot strange land;

The 3rd layer---the remote measurement and control layer, based on JSP and XML technology, by remote terminal software or web browser, the user is according to the authority of self, can realize the remote analysis of data, for the flexecutive or the expert assigns the observing and controlling order or decision-making provides the data support.

As seen from the above technical solution provided by the invention, a kind of distributed type double real-time compression method based on PLSR_SBR of the present invention is based on the collaborative TT﹠C system of IP pattern, gather a plurality of values of consult volume with being distributed in on-the-spot a plurality of IP pattern intelligent measuring and control devices, upload the data to by Ethernet interface in the real-time data base server of enterprise supervision layer, carry out double-deck Real Time Compression.

Description of drawings

Fig. 1 is the enforcement illustration of the collaborative TT﹠C system of IP pattern of the present invention;

Fig. 2 is the IP pattern real-time dataBase system figure towards the collaborative TT﹠C system of IP pattern of the present invention;

Fig. 3 be of the present invention towards the collaborative TT﹠C system of IP pattern based on PLSR_SBR double real-time compression method process flow diagram;

Fig. 4 is for setting up the flow process illustration of basic sequence in the real-time compression method of the present invention;

Fig. 5 decomposes elongated subsequence process flow diagram in the real-time compression method of the present invention.

Specific embodiment

The specific embodiments of the invention structure as shown in Figure 1.This embodiment comprises ground floor---on-the-spot observing and controlling layer specifically comprises on-the-spot sensing and controlled plant module, IP pattern intelligent measuring and control device, IP fuzzy control device, on-the-spot controlled plant group; The second layer---enterprise-level supervisory layers specifically comprises Ethernet switch or hub, on-site supervision microcomputer, real-time data base server, observing and controlling strategic server, the webserver; The 3rd layer---based on Internet remote measurement and control layer, mainly comprise remote work station and terminal microcomputer.Principle of work and implementation process that this embodiment is concrete are as follows:

Present embodiment at first is arranged in many IP pattern intelligent measuring and control devices in the collaborative TT﹠C system of IP pattern in each surveyed area, each table apparatus (having 8 passages) connects the parameter that the requirement of system applies object is gathered, each identical passage of each table apparatus all connects identical sensing parameter (parameter number and parametric type), by switch each is installed (see figure 2) in the IP pattern real-time data base server that each parameter collection value is uploaded to the enterprise supervision layer.In the time that limits, the coupled relation of each parameter based on PLSR and SBR principle, carries out double-deck Real Time Compression to the many reference amounts data in comprehensive each measure and control device of real-time data base.

Technological core of the present invention is the compression (see figure 3) of carrying out the multiple correlation relation between a plurality of auxiliary parameters of being left based on the SBR method in the compression of coupled relation between main parameter and the auxiliary parameter and the second layer in the ground floor of IP pattern real-time data base based on the PLSR principle.Wherein:

1, based on the ground floor compression method of PLSR method:

Definition 1 is in the many reference amounts of the collaborative TT﹠C system of IP pattern, and it is closely related with some other parameter that some parameters are arranged, and these parameters by other parameter decision are called main parameter, and other parameter that influences main parameter is called auxiliary parameter.Suppose to have in the collaborative monitoring platform of IP pattern to have m sensing parameter, p auxiliary parameter (being also referred to as independent variable) { x wherein arranged ₁, x ₂..., x _p, behind n sample point of observation, constitute argument data Table X=[x ₁, x ₂..., x _p] _{N * p}Because it is more generally to belong to the situation of single main parameter (being also referred to as dependent variable) in many sensings parameter of the collaborative TT﹠C system of IP pattern, therefore mainly study at single dependent variable based on the double real-time compression method of PLSR_SBR, dependent variable data wherein are expressed as Y=[y] _{N * 1}If there have independent and other parameters of q parameter to exist to be certain related, then the dependent variable data are expressed as Y=[y ₁, y ₂..., y _q] _{N * q}Begin to set up in the process of PLSR model, the selection of number of principal components is extremely important.In a plurality of major components of calculating, first principal component is most important, increases with number of principal components, significance level reduces successively, so that many major component reflections finally is noise information, so the dimension of this model can determine that compression validity is defined as follows with the compression validity check.

Definition 2 supposition will be extracted h composition at present.At first, regression equation of m composition match is adopted in all sample point set (containing n-1 sample) of removing certain sample point i; The equation of the sample point i substitution front match that was excluded just now, can obtain the match value of dependent variable on sample point i then

Respectively to each sample i=1,2 ..., n repeats above-mentioned steps, obtains squared prediction error and the PRESS of dependent variable y _h

{PRESS}_{h} = Σ_{i = 1}^{n} {(y_{i} - {\hat{y}}_{h (- i)})}^{2} . . . (1)

Adopt all sample point matches to contain the regression equation of h composition again.If the predicted value of i sample point of note is , the error sum of squares SS of y then _h

{SS}_{h} = Σ_{i = 1}^{n} {(y_{i} - {\hat{y}}_{hi})}^{2} . . . (2)

Then the compression validity of h composition is

Q_{h}^{} = 1 - \frac{{PRESS}_{h}}{S S_{(h - 1)}} . . . (3)

With compression measure of effectiveness composition t _hContributrion margin to the model of fit precision has following yardstick: when

Q_{h}^{} &GreaterEqual; (1 - {0.95}^{2}) = 0.0975

The time, t _hThe contributrion margin of composition is significant.Obviously

Q_{h}^{} &GreaterEqual; 0.0975

With (PRESS _h/ SS _H-1)＜0.95 ²It is principle of decision-making of equal value fully.At this moment increase composition t _m, can be significantly improved to model, therefore, also can consider to increase composition t _mBe obviously useful.

Composition t is extracted in double real-time compression method ground floor compression based on PLSR_SBR respectively in X and Y _jAnd u, wherein t _jBe x ₁, x ₂..., x _pLinear combination, u is the linear combination of y.Be extracted into timesharing t _jMust satisfy following two conditions with u:

1. t _jWith u should be big as far as possible the variation information in the tables of data separately of carrying;

2. t _jReach maximum with the degree of correlation of u.

The ground floor compression method comprises following five steps:

(1) data normalization is handled

Respectively dependent variable and independent variable are carried out standardization.Adopt the translation exchange to guarantee data by center of gravity, compression is handled can eliminate dimension.After data normalization was handled, the center of gravity of data overlapped with initial point.

F_{0} = {(F_{01}, F_{01}, L, F_{0 q})}_{n \times q}, F_{0 j} = {y_{j}}^{*} = \frac{y_{j} - E (y_{j})}{S_{y_{j}}} (j = 1,2, \cdot \cdot \cdot, q) . . . (4)

E_{0} = {(E_{01}, E_{02}, L, E_{0 p})}_{n \times p}, E_{0 i} = {x_{i}}^{*} = \frac{x_{i} - E (x_{i})}{S_{x_{i}}} (i = 1,, \cdot \cdot \cdot, p) . . . (5)

In the formula, F ₀, E ₀Be respectively Y, the standardization matrix of X; E (y _j), E (x _i) be respectively Y, the average of X; S _Yj, S _XiBe respectively Y, the mean square deviation of X; N is a sample size.

(2) first composition t ₁Extract

Known F ₀, E ₀From E ₀Middle first composition t that extracts ₁, t ₁=E ₀W ₁, W ₁Be E ₀First axle, and ‖ W ₁‖=1; t ₁Be standardized variable x ₁*, x ₂* ..., x _p* linear combination is reintegrating of prime information, W ₁Be combination coefficient.From F ₀Middle first ingredient u of extracting ₁, u ₁=F ₀C ₁, C ₁Be F ₀First the axle, ‖ C ₁‖=1.Require t ₁, u ₁Can distinguish the data variation information of representing well among X and the Y, satisfy t simultaneously ₁To u ₁Maximum interpretability is arranged.According to principal component analysis (PCA) principle and canonical correlation analysis thinking, be actually and require t ₁With u ₁The covariance maximal value, this is an optimization problem.That is:

Cov (t_{1}, u_{1}) = \sqrt{Var (t_{1}) Var (u_{1})} r (t_{1}, u_{1})

Be maximum, wherein r (t ₁, u ₁) be t ₁With u ₁Degree of correlation maximal value.Therefore at ‖ W ₁‖=1 and ‖ C ₁Under the constraint condition of ‖=1, remove to ask (W ₁ ^TE ₀ ^TF ₀C ₁) maximal value, adopt Lagrangian algorithm, have through derivation:

E_{0}^{T} F_{0} F_{0}^{T} E_{0} W_{1} = θ_{1}^{2} W_{1} . . . (6)

F_{0}^{T} E_{0} E_{0}^{T} F_{0} C_{1} = θ_{1}^{2} C_{1} . . . (7)

θ ₁The objective function of optimization problem just.Wherein, W ₁Be E ₀ ^TF ₀F ₀ ^TE ₀Proper vector, θ ₁ ²It is the characteristic of correspondence value.Want θ ₁Get maximal value, then W ₁Be E ₀ ^TF ₀F ₀ ^TE ₀The unit character vector of matrix eigenvalue of maximum, C ₁Be corresponding to matrix F ₀ ^TE ₀E ₀ ^TF ₀Eigenvalue of maximum θ ₁ ²The unit character vector.

Because dependent variable F in IP pattern real-time data base multidimensional data ₀Just variable, so C ₁Be a constant, can get

θ_{1}^{2} = {| | E_{0}^{T} F_{0} | |}^{2} . . . (8)

According to the cycle calculations formula of partial least squares regression, can obtain

W_{1} = \frac{E_{0}^{T} u_{1}}{θ_{1}} = \frac{E_{0}^{T} F_{0}}{| | E_{0}^{T} F | |} . . . (9)

Because E ₀, F ₀All are vector of unit length, so have

W_{1} = \frac{1}{\sqrt{Σ_{j = 1}^{p} r^{2} (x_{j}, y)}} [\begin{matrix} r (x_{1}, y) \\ \cdot \cdot \cdot \\ r (x_{p}, y) \end{matrix}] . . . (10)

Try to achieve a W ₁After, get final product composition

t_{1} = E_{0} W_{1} = \frac{1}{\sqrt{Σ_{j = 1}^{p} r^{2} (x_{j}, y)}} [r (x_{1}, y) E_{01} + \cdot \cdot \cdot + r (x_{p}, y) E_{0 p}] . . . (11)

Ask F respectively ₀, E ₀To t ₁Regression equation:

E_{0} = t_{1} p_{1}^{T} + E_{1} . . . (12)

F ₀＝t ₁r ₁+F ₁ (13)

In the formula,

p_{1} = \frac{E_{0}^{T} t_{1}}{{| | t_{1} | |}^{2}},

r_{1} = \frac{F_{0}^{T} t_{1}}{{| | t_{1} | |}^{2}}

Be corresponding regression coefficient vector (scalar); The note residual matrix

E_{1} = E_{0} - t_{1} p_{1}^{T} = [E_{11}, \cdot \cdot \cdot, E_{1 p}] . . . (14)

F ₁＝F ₀-t ₁r ₁ (15)

(3) second composition t ₂Extract

With E ₁Replace E ₀, F ₁Replace F ₀, ask second axle W with top method ₂With second composition t ₂, have

W_{2} = \frac{E_{1}^{T} F_{1}}{| | E_{1}^{T} F_{1} | |} = \frac{1}{\sqrt{Σ_{j = 1}^{p} Cov (E_{1 j}, F_{1})}} [\begin{matrix} Cov (E_{11}, F_{1}) \\ \cdot \cdot \cdot \\ Cov (E_{1 P}, F_{1}) \end{matrix}] . . . (16)

t ₂＝E ₁W ₂ (17)

W ₂Be matrix E ₁ ^TF ₁F ₁ ^TE ₁Eigenvalue of maximum θ ₂ ²The unit character vector,

Implement E ₁, F ₁To t ₂Recurrence, have

E_{1} = t_{2} p_{2}^{T} + E_{2} . . . (18)

F ₁＝t ₂r ₂+F ₂ (19)

In the formula:

p_{2} = \frac{E_{1}^{T} t_{2}}{{| | t_{2} | |}^{2}},

r_{2} = \frac{F_{1}^{T} t_{2}}{{| | t_{2} | |}^{2}} .

(4) h composition t _hExtract

In like manner, inquire into h composition t _hM can discern with compression validity principle, and h is less than the order of X.

(5) inquire into PLSR regression model equation

F ₀About t ₁, t ₂..., t _hThe least square regression equation:

E_{0} = t_{1} p_{1}^{T} + t_{2} p_{2}^{T} + L + {t_{h} p}_{h}^{T} . . . (20)

F ₀＝t ₁r ₁+t ₂r ₂+L+t _hr _h (21)

Because t ₁, t ₂..., t _hAll are E ₀₁, E ₀₂..., E _0pLinear combination, according to partial least squares regression character, know

t_{h} = E_{h - 1} W_{h} = E_{0} Π_{j = 1}^{h - 1} (I - W_{j} p_{j}^{T} {) W}_{h} = E_{0} W_{h}^{*} . . . (22)

Note

W_{h}^{*} = Π_{j = 1}^{h - 1} (I - W_{j} p_{j}^{T}) W_{h}

So,

F_{0} = E_{0} (r_{1} W_{1}^{*} + \cdot \cdot \cdot + r_{p} W_{p}^{*}) . . . (23)

If note

x_{j}^{*} = E_{0 j}, y^{*} = F_{0}

a_{j} = Σ_{h = 1}^{m} r_{h} W_{hj}^{*}

Therefore, can get regression equation is:

{\hat{y}}_{j}^{*} = α_{1 j} x_{1}^{*} + α_{2 j} x_{2}^{*} + L + α_{pj} x_{p}^{*} . . . (24)

What from the above-mentioned course of work as can be seen, the double real-time compression method ground floor of PLSR_SBR adopted is the strategy of a kind of information decomposition and extraction.It is at multi-variable system x ₁, x ₂..., x _pIn extract generalized variable t one by one ₁, t ₂..., t _m(m＜p), this is equivalent to x ₁, x ₂..., x _pIn information reconfigure and extract, thereby obtain the interpretability of Y the strongest, the generalized variable that can summarize simultaneously information among the independent variable set X again.And meanwhile, Y is not explained the information of meaning has been excluded naturally, also just obtained good Information Compression.If there are a plurality of dependent variables of representing by other a plurality of parameters in the multidimensional parameter, rerun the ground floor compression process of algorithm so.

Through above-mentioned analysis as can be known, in the double real-time compression method based on PLSR_SBR, the I as a result of ground floor compression can be expressed as each dependent variable data ordered series of numbers the form storage of (1+p) polynary group:

1. row: the sequence number that is defined as main parameter in the multidimensional parameter;

2. a _1j, a _2j... a _Pj, be defined as the regression model parameter, wherein 1,2 ..., p is the sequence number of auxiliary parameter in the multidimensional parameter.

2, based on the second layer compression method that improves SBR:

After the double real-time compression method process ground floor compression based on PLSR_SBR, be left relevant stronger other a plurality of independents variable (as temperature, humidity, pressure etc.), can be right after the compression of carrying out the second layer and calculate.x _i{ x _I1..., x _InAnd x _j{ x _J1..., x _JnBe two argument data sequences with n sample point, the distance between i sequence of distance metric canonical representation and j sequence.Because data sequence has stronger time restriction in the IP pattern real-time data base, data bulk is little, and especially wherein the measurement variation of a plurality of parameters is little, and linear dependence is more obvious, and therefore the distance definition of estimating between sequence is as follows.

Definition 3 is based on x _j{ x _J1..., x _JnTo x _i{ x _I1..., x _InCarrying out least square linear fit, fitting result is

, based on Euclidean distance, its model of fit evaluation can be expressed as

d ({\overset{&OverBar;}{x}}_{i}, {\overset{\overset{Λ}{&OverBar;}}{x}}_{i}) = {(Σ_{k = 1}^{n} {(x_{ik} - {\overset{Λ}{x}}_{jk})}^{2})}^{1 / 2}

, requirement

Σd ({\overset{&OverBar;}{x}}_{i}, {\overset{\overset{Λ}{&OverBar;}}{x}}_{j}) \leq δ

, δ is the total error of the data sequence match of setting.

Definition 4 is based on x _j{ x _J1..., x _JnTo x _i{ x _I1..., x _IkCarry out least square linear fit, wherein k＜n then claims x _j{ x _J1..., x _JnUse amount be k.

In the double real-time compression method second layer of PLSR_SBR, all argument data sequences are divided into a plurality of subsequences, and finally are mapped in the basis signal, its mapping result II will be expressed as the form of 5 elements:

1. row: the row number of this data sequence, when data sequence the time in elongated decomposition, can be adding 1 or 2 left sided sequence and right side sequences (becoming 2) after distinguishing decomposition behind the original row number, as 11 and 12 from 1.If also will continue to decompose, in the same way, become more multidigit, so that when decompressing, can discern.

2. shift: the side-play amount of definition subsequence in basis signal;

3. a, b: the parameter that is defined as simple regression;

4. error: the total error that is defined as simple regression.

Each fitting data sequence generally can be used (row, shift, a, b, error) expression wherein.

Mainly comprise following three steps:

(1) sets up the basis signal of forming by basic sequence.

The foundation of basis signal is based on a most important link in the second layer of PLSR_SBR double real-time compression method, it is the benchmark that the data left sequence is carried out the match compression, has determined the compressibility of IP pattern real-time data base and the sequence global error size after the compression.The basis sequence is the parton sequence of electing from remaining sequence.Because basis signal is made up of basic sequence, therefore selecting the subsequence number of basic sequence in the composition basis signal is the key of research, and its implementation procedure as shown in Figure 4.

The sequence of a remaining m-q auxiliary parameter sequence (each sequence has individual n data point) is composed in series a big sequence, and be divided into the sequence that the K group comprises W data point again, wherein

W = \sqrt{(m - q) \times n},

Promptly

K = \sqrt{(m - q) \times n}

, the sequence of every W data point is called subsequence.Remove existing each subsequence j of match (comprising ordered series of numbers itself) respectively with a certain subsequence i successively then, can draw the error value E rror of each match _jAt linearErr (Cand _j) 〉=Error _jUnder the constraint condition, match finishes all subsequences, can draw the benefit value of a current sequence i, and formula is as follows:

benefit = Σ_{j = 1}^{k} linearErr ({Cand}_{j}) - {Error}_{j} . . . (25)

LinearErr (Cand wherein _j) be by self error of fitting of match sequence.

Press the size of benefit value and press all subsequences of descending sort, and be stored in the base_list variable.Afterwards, adopt binary search (seeing Fig. 4 right-hand component), select preceding Ins the subsequence (being called basic sequence) among the bast_list to form basis signal, the global error that the global error of the whole original data sequence that is left with preceding Ins basic sequence match obtains less than the match of the individual basic sequence of preceding Ins-1 or Ins+1 institute.Wherein the remaining whole measurement data sequence of Ins the basic sequence match computing method that obtain global error are finished by following step.

(2) remaining original data sequence carries out elongated decomposition, obtains final match total error ε.

The whole process that remaining whole original data sequence carries out elongated decomposition is established to each sequence in the data left sequence mapping of basis signal respectively as shown in Figure 5.The process of mapping is a benchmark with the basis signal exactly, and original data sequence begins match from the high order end of basic sequence, and match once can obtain five elements array (row, shift, an a ₁, b ₁, error ₁).Back original data sequence moves one to basis signal the right, and promptly shift=shift+1 can obtain another one five elements array (row, shift, a again ₂, b ₂, error ₂), original data sequence is shifted according to this, till shift=s * w-w, obtains altogether (s-1) * w+1 five elements array, and by comparing the error in each five elements array, that five elements array of error minimum is promptly represented this original data sequence.Each sequence determines at last that by it corresponding error of fitting error carries out descending sort in the whole then original data sequence.It is two subsequences that first original data sequence is divided equally, and then is established to the mapping of basis signal respectively, can carry out descending sort again according to error of fitting error again afterwards.In order to reduce the number of final subsequence, improve compressibility, if do not satisfy less than the total error ε of system _AlwaysSituation under, according to top method cycle calculations, the length of the subsequence after being decomposed is less than till 6.Last match total error ε can be calculated by the formula in the definition 3.

(3) basis signal is upgraded.

In case the setting value ε of last global error ε occur obtaining greater than the user _IfSituation, therefore then such decomposition result can not meet the demands, and needs to seek the method that reduces global error ε.This paper takes to carry out on the elongated decomposition computation method basis of invariable basis signal being upgraded at remaining original data sequence.Use amount and cumulative errors comprehensive evaluation principle are proposed, promptly in each data sequence of basis signal, use amount k in the application definition 4, select the minimum ordered series of numbers of access times, when the identical a plurality of ordered series of numbers of access times wherein occurring, can compare their cumulative errors again, thereby finally can find the wherein data sequence of cumulative maximum error, substituted by the maximum pairing subsequence of benefit value in the basis signal of leading portion compressed data sequences in the same time period, the side makes final global error ε meet user's setting value ε again _If

Comprehensive above method computational analysis can draw the IP pattern and work in coordination with in the IP pattern real-time data base of TT﹠C system, based on the compressibility universal calculation equation of PLSR_SBR double real-time compression method is:

\frac{m \times n - (p \times q + q + Ins \times \sqrt{(m - q) \times n} + t \times 5)}{m \times n} . . . (26)

In the formula (26), t is the elongated total number of sequence of decomposing the back gained of data left sequence.

The present invention solves the common problem of similar field, can be widely used in chemical industry, grain and military munition warehouse, cigarette, medicine, weaving manufacturing and directly be generalized to modern greenhouse medium.Mainly contain following following two characteristics:

(1) double real-time compression method based on PLSR_SBR relatively is adapted at using under the following concrete condition: have main parameter and auxiliary parameter, and different parametric data or same parameter different time sections correlation of data are stronger.The data sequence correlativity is strong more, and its compression effectiveness is good more; The distributed data amount is big more, and its compressibility is high more, has satisfied the IP pattern and has worked in coordination with the development that monitoring platform is expanded measuring control point arbitrarily, has reduced the data traffic of the storage area and the platform of real-time data base;

(2) if under the very large situation of data volume, can consider to compress at the same parameter of concrete multinode, compressibility can improve greatly so.

The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1, a kind of distributed type double real-time compression method based on PLSR_SBR is characterized in that, comprising:

The A ground floor adopts the strategy of information decomposition and extraction, utilizes compression validity to determine number of principal components, sets up main parameter and auxiliary parameter cover half type really;

The B second layer is chosen basic sequence at auxiliary parameter, sets up the basis signal that embodies the distributed data feature, the original data sequence of assisting parameter is divided into serial subsequence compresses.

2, according to the described distributed type double real-time compression method of claim 1, it is characterized in that steps A also comprises:

A1 master's parameter is one or more;

A2 uses the PLSR method and carries out the collaborative TT﹠C system distributed information decomposition of IP pattern and extract, make the sample size value carry out modeling then less than variate-value, eliminate the multiple correlation between main parameter and auxiliary parameter, main at last parameter is by the form storage of (1+p) polynary group.

3, according to the described distributed type double real-time compression method of claim 1, it is characterized in that step B also comprises:

B1 will assist the sequence of parameter sequence to compose in series a sequence, and be divided into the subsequence that K comprises W data point again, benefit value by each sequence is carried out descending sort, according to preceding Ins the subsequence of global error less than preceding Ins-1 or Ins+1 basic sequence, polyphone gets up to set up basis signal;

B2 is in conjunction with the square law and the mode of dextroposition progressively, each original data sequence is carried out elongated decomposition, in system's total error is under the ε, reduce the number of subsequence, if total error can not satisfy less than ε, cycle calculations successively then, the length of the subsequence after being decomposed is less than till 6, and each subsequence is by the form storage of 5 elements at last;

B3 carries out comprehensive evaluation according to use amount and cumulative errors, the realization basis signal is upgraded, wherein use amount and cumulative errors integrated evaluating method are: select the minimum ordered series of numbers of access times, when selecting out the identical a plurality of ordered series of numbers of use number of times, the cumulative errors that compare each ordered series of numbers, to determine the wherein data sequence of cumulative maximum error, wherein access times are based on x _j{ x _J1..., x _JnTo x _i{ x _I1..., x _Ik(k＜n) carries out the k in the least square linear fit.

4, distributed type double real-time compression method according to claim 1 and 2, it is characterized in that, described application PLSR carries out the collaborative TT﹠C system distributed information of IP pattern and decomposes and extraction, according to coupled relation between each parameter of system, single main parameter (dependent variable) as main object, is determined the mathematical relation between main parameter and the auxiliary parameter.

5, determine according to claim 1 or 3 described subsequence numbers, adopt binary search, the global error that obtains less than the match of the basic sequence of preceding Ins-1 or Ins+1 institute based on the global error of the remaining original data sequence of preceding Ins basic sequence match.

6, the collaborative TT﹠C system of a kind of distributed type double Real Time Compression IP pattern is characterized in that comprise: system carries out data communication based on the Ethernet mode, and whole network monitoring platform is divided into tri-layer:

Ground floor---on-the-spot observing and controlling layer comprises on-the-spot sensing and controlled plant module, IP pattern intelligent measuring and control device, IP fuzzy control device, on-the-spot controlled plant group;

The second layer---enterprise-level supervisory layers comprises Ethernet switch or hub, on-site supervision microcomputer, real-time data base server, observing and controlling strategic server, the webserver;

The 3rd layer---based on Internet remote measurement and control layer, mainly comprise remote work station and terminal microcomputer.

7, work in coordination with the on-the-spot observing and controlling layer of TT﹠C system according to the described IP pattern of claim 6, it is characterized in that, comprise after the IP pattern intelligent monitoring equipment that is distributed in on-the-spot each position is handled the environmental parameter signals that collects in real time data are sent on the Ethernet, on-the-spot microcomputer is based on the Delphi technology, each parameter is concentrated coordinated management, and, promote on-the-spot controlled plant group and implement field control by the control strategy that IP pattern fuzzy control device reception host computer sends.

8, work in coordination with the enterprise-level supervisory layers of TT﹠C system according to the described IP pattern of claim 6, it is characterized in that, based on JAVA, virtual instrument technology, comprise tactful software for editing, monitor terminal software and three independent modules of carrying out of real-time kernel software, by tactful software for editing editing graph observing and controlling strategy, the observing and controlling strategy becomes serial micro-order through compiling, sends to logging-controlling apparatus used in situ through real-time kernel software and carries out; Monitor terminal software is obtained the video data of instrument face plate from real-time kernel software, generate graphic user interface, the monitor terminal running software is according to the normal operation of real-time kernel software, and a plurality of monitor terminal softwares can be connected to a real-time kernel software simultaneously, realizes the monitoring of multiple spot strange land.

9, work in coordination with the remote measurement and control layer of TT﹠C system according to the described IP pattern of claim 6, it is characterized in that, according to JSP and XML, by remote terminal software or web browser, the user is according to the authority of self, realize the remote analysis of data, for distance supervisor sends the observing and controlling order or decision-making provides the data support.