CN110489451A - Flow calculation methodologies based on Iterative statistical - Google Patents
Flow calculation methodologies based on Iterative statistical Download PDFInfo
- Publication number
- CN110489451A CN110489451A CN201910745061.3A CN201910745061A CN110489451A CN 110489451 A CN110489451 A CN 110489451A CN 201910745061 A CN201910745061 A CN 201910745061A CN 110489451 A CN110489451 A CN 110489451A
- Authority
- CN
- China
- Prior art keywords
- statistical
- data
- value
- formula
- flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Complex Calculations (AREA)
Abstract
The present invention relates to data processing fields, are a kind of flow calculation methodologies based on Iterative statistical for avoiding calculation procedure obstruction, improving statistical value computational efficiency specifically.The data flow X of real time data splits data flow X according to the time;The data X of a cycle is calculated with statistical formula G1Statistical value, obtain initial value f1;According to initial value f1It with statistical formula G, is successively iterated to calculate in each period of data flow X, obtains current data stream XmCurrent statistic value fm;If m < n, continues to execute step S3, if m=n, final statistical value f is exportedn=Gn, n is data flow X divided data number, and m is current data number.Flow calculation methodologies based on Iterative statistical of the invention, without by statistics in need value carry out one-time calculation, each specific period can be decomposed for a large amount of calculation amounts in real time by carrying out statistical formula iteration by loop structure, it avoids prolonged calculation procedure obstruction from waiting, achievees the purpose that real-time high-efficiency calculates.
Description
Technical field
The present invention relates to data processing fields, are that one kind avoids calculation procedure obstruction, improves statistical value calculating specifically
The flow calculation methodologies based on Iterative statistical of efficiency.
Background technique
In traditional flow chart of data processing, data are always first collected, are then placed data into DB.It is needed as people
When query done to data by DB, obtain answer or carry out relevant processing.Although seeming so very rationally,
As a result but very compact, especially in some real-time search application environments certain particular problems, are similar to MapReduce
The processed offline of mode not can be well solved problem, therefore stream calculation is come into being.
Stream calculation is that data flow is loaded onto calculator memory in chronological order, within save as carrier carry out efficiently in real time meter
A kind of calculating mode calculated, since period does not interact with external resources such as hard disk and networks, computational efficiency with higher.
In industrial manufacturing process, data are generated in real time by a large amount of automation equipments, and industrial big data processing requirement high real-time reaches
The response requirement of Millisecond.Therefore, in a certain range, how quick Realtime Statistics, and calculate data characteristics, being must
It need solve the problems, such as.Traditional stream calculation need to disposably calculate all data to be counted, and efficiency is very low.To keep away
Exempt from prolonged calculation procedure obstruction to wait, achieve the purpose that efficiently to calculate, a large amount of calculation amounts in real time are decomposed into each week
Phase introduces the thought of Iterative statistical, designs a kind of flow calculation methodologies based on Iterative statistical.
Summary of the invention
It is an object of the invention to: provide it is a kind of avoid calculation procedure obstruction, improve statistical value computational efficiency based on repeatedly
The flow calculation methodologies of generation statistics.
The present invention is achieved through the following technical solutions: the flow calculation methodologies based on Iterative statistical, comprising the following steps:
Step S1: the data flow X of real time data is obtained, and data flow X is split according to the time;
Step S2: the data X of a cycle is calculated using statistical formula G1Statistical value, obtain initial value f1;
Step S3: according to initial value f1With statistical formula G, each period by loop structure in data flow X is carried out
It successively iterates to calculate, to obtain current data stream X in real timemCurrent statistic value fm;
Step S4: if m < n, continues to execute step S3, if m=n, jump procedure S5;
Step S5: final statistical value f is exportedn=Gn, f is statistical value, total data amount check that wherein n is data flow X points, m
For the current data number used when calculating.
Further, in order to preferably realize the present invention, following settings are especially used: described in the step S1
Data flow X is read in real time from memory pipeline, and data flow X is broken down into n sections according to time cycle property, is data X1,
X2... Xn。
Further, in order to preferably realize the present invention, following settings are especially used: described in the step S2
Statistical formula G is most to be worth formula.
Further, in order to preferably realize the present invention, especially use following settings: the statistical formula G is variance
Formula.
Further, in order to preferably realize the present invention, especially use following settings: the statistical formula G is standard
Poor formula.
Further, in order to preferably realize the present invention, especially use following settings: the statistical formula G is expectation
Formula.
Further, in order to preferably realize the present invention, following settings are especially used: when the statistical formula G is scheduled to last
When hoping formula, the statistical value f of final outputn=Gn=E (Xn), wherein
N is the data amount check of data flow X, f in formulanFor
The statistical value of all data, f in data flow X(n-1)Except the statistical value after recent statistics value, that is, to end previous in data flow
Data X(n-1)Statistical value, XnFor latest data value.
Further, in order to preferably realize the present invention, following settings are especially used: current to unite in the step S3
Evaluation fm=Gm=E (Xm), wherein
M is the current data amount check of data flow X in formula,
fmFor the statistical value of current data in data flow X, f(m-1)For in data flow except the statistical value after current statistic value, that is, before ending
One data X(m-1)Statistical value, XmFor Current data values.
Compared with prior art, the present invention having the following advantages that and the utility model has the advantages that the stream of the invention based on Iterative statistical
Calculation method, without by statistics in need value carry out one-time calculation, by loop structure carry out statistical formula iteration be
A large amount of calculation amounts in real time can be decomposed each specific period, avoid prolonged calculation procedure obstruction from waiting, reach real
The purpose that Shi Gaoxiao is calculated.
In addition, efficiency of algorithm can be effectively improved, algorithm space complexity is reduced.According to statistics iterative formula fn=G
(f(n-1),Xn, n), in memory-resident there are three variables, n, Xn, fn, when data flow changes, these three variables are changed
Generation, which calculates, to be updated, and space complexity is that o (1) remains unchanged.
Furthermore it is possible to effectively improve efficiency of algorithm, algorithm space complexity is reduced.If using traditional statistics calculating side
Method, each data will carry out n-1 sub-addition and 1 division in stream calculation, amount to n times operation, and time complexity is o (n),
When n is very big, time complexity can be very big, and calculation procedure may block;If using the calculation method of Iterative statistical
According to statistics iterative formula fn=G (f(n-1),Xn, n), 1 multiplication, 1 sub-addition and 1 division totally 3 fortune need to be only carried out every time
It calculates, time complexity is that o (1) remains unchanged.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the flow calculation methodologies of the invention based on Iterative statistical;
Fig. 2 is a kind of loop structure schematic diagram of the flow calculation methodologies of the invention based on Iterative statistical.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and for explaining only the invention, and is not considered as limiting the invention.
Below will be in conjunction with attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.Usually
The component of the embodiment of the present application being described and illustrated herein in the accompanying drawings can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiments herein provided in the accompanying drawings is not intended to limit below claimed
Scope of the present application, but be merely representative of the selected embodiment of the application.Based on embodiments herein, those skilled in the art
Member's every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Embodiment 1:
The present invention is achieved through the following technical solutions, as shown in Figure 1 and Figure 2, the stream calculation of the invention based on Iterative statistical
Method, comprising the following steps:
Step S1: the data flow X of real time data is obtained, and data flow X is split according to the time;
Step S2: the data X of a cycle is calculated using statistical formula G1Statistical value, obtain initial value f1;
Step S3: according to initial value f1With statistical formula G, each period by loop structure in data flow X is carried out
It successively iterates to calculate, to obtain current data stream X in real timemCurrent statistic value fm;
Step S4: if m < n, continues to execute step S3, if m=n, jump procedure S5;
Step S5: final statistical value f is exportedn=Gn, f is statistical value, total data amount check that wherein n is data flow X points, m
For the current data number used when calculating.
Flow calculation methodologies based on Iterative statistical of the invention, without by the value of statistics in need disposably counted
It calculates, each specific period can be decomposed for a large amount of calculation amounts in real time by carrying out statistical formula iteration by loop structure, be kept away
Exempt from prolonged calculation procedure obstruction to wait, achievees the purpose that real-time high-efficiency calculates.
In addition, efficiency of algorithm can be effectively improved, algorithm space complexity is reduced.According to statistics iterative formula fn=G
(f(n-1),Xn, n), in memory-resident there are three variables, n, Xn, fn, when data flow changes, these three variables are changed
Generation, which calculates, to be updated, and space complexity is that o (1) remains unchanged.
Furthermore it is possible to effectively improve efficiency of algorithm, algorithm space complexity is reduced.If using traditional statistics calculating side
Method, each data will carry out n-1 sub-addition and 1 division in stream calculation, amount to n times operation, and time complexity is o (n),
When n is very big, time complexity can be very big, and calculation procedure may block;If using the calculation method of Iterative statistical
According to statistics iterative formula fn=G (f(n-1),Xn, n), 1 multiplication, 1 sub-addition and 1 division totally 3 fortune need to be only carried out every time
It calculates, time complexity is that o (1) remains unchanged.
When carrying out stream calculation, the data flow X of real time data is obtained first, is torn open according to time series according to period distances
Point, data flow X is split as the data X with number of cycles equivalent number1, X2... Xn;Then the is calculated using statistical formula G
The data X of a cycle1, obtain the statistical value f of a cycle1, then with statistical value f1For initial value, in conjunction with next period
Data X2The statistical value f of iteration calculating second round2, primary iteration is the f as n=11=G (f0,X1, 1), as n=2,
f2=G (f1,X2,2).Successively iterate to calculate out current data XmStatistical value fm;Judge whether m is final number of cycles n, if
M < n, then continue iteration, if m=n, exports final statistical value fn, without by the value of statistics in need disposably counted
It calculates, each specific period can be decomposed for a large amount of calculation amounts in real time by carrying out statistical formula iteration by loop structure, be kept away
Exempt from prolonged calculation procedure obstruction to wait, achievees the purpose that real-time high-efficiency calculates.
Embodiment 2:
The present embodiment advanced optimizes on the basis of the above embodiments, in the step S1, the data flow X
It is read in real time from memory pipeline, data flow X is broken down into n sections according to time cycle property, is data X1, X2... Xn.Its
Middle n is number of cycles and the data amount check that data flow X is split.
Embodiment 3:
The present embodiment advanced optimizes on the basis of the above embodiments, and in the step S2, the statistics is public
Formula G is most to be worth formula, formula of variance, standard deviation formula or expectation formula.The statistical formula G can be according to actual demand
It chooses, can be formula of variance, can be standard deviation formula, or it is expected formula, can also be to be most worth formula
Other formula.
Embodiment 4:
The present embodiment advanced optimizes on the basis of the above embodiments, when the statistical formula G is desired formula
When, the statistical value f of final outputn=Gn=E (Xn), wherein
N is the data amount check of data flow X, f in formulanFor
The statistical value of all data, f in data flow X(n-1)Except the statistical value after recent statistics value, that is, to end previous in data flow
Data X(n-1)Statistical value, XnFor latest data value.
Embodiment 5:
The present embodiment advanced optimizes on the basis of the above embodiments, in the step S3, is adopted with statistical formula G
For desired formula, current statistic value fm=Gm=E (Xm), wherein
M is the current data amount check of data flow X in formula,
fmFor the statistical value of current data in data flow X, f(m-1)For in data flow except the statistical value after current statistic value, that is, before ending
One data X(m-1)Statistical value, XmFor Current data values.When carrying out stream calculation, the data flow X of real time data, root are obtained first
It is split according to time series according to period distances, data flow X is split as the data X with number of cycles equivalent number1,
X2... Xn;Then the data X of a cycle is calculated using statistical formula G1, obtain the statistical value f of a cycle1, then
With statistical value f1For initial value, in conjunction with the data X in next period2The statistical value f of iteration calculating second round2, successively iterate to calculate
Current data X outmStatistical value fm;Judge whether m is final number of cycles n, if m < n, continues iteration, if m=n,
Export final statistical value fn, without by the value of statistics in need carry out one-time calculation, it is public to carry out statistics by loop structure
A large amount of calculation amounts in real time can be decomposed each specific period by formula iteration, avoid prolonged calculation procedure obstruction etc.
Wait achieve the purpose that real-time high-efficiency calculates.
The above is only presently preferred embodiments of the present invention, not does limitation in any form to the present invention, it is all according to
According to technical spirit any simple modification to the above embodiments of the invention, equivalent variations, protection of the invention is each fallen within
Within the scope of.
Claims (8)
1. the flow calculation methodologies based on Iterative statistical, it is characterised in that the following steps are included:
Step S1: the data flow X of real time data is obtained, and data flow X is split according to the time;
Step S2: the data X of a cycle is calculated using statistical formula G1Statistical value, obtain initial value f1;
Step S3: according to initial value f1With statistical formula G, each period by loop structure in data flow X is successively changed
In generation, calculates, to obtain current data stream X in real timemCurrent statistic value fm;
Step S4: if m < n, continues to execute step S3, if m=n, jump procedure S5;
Step S5: final statistical value f is exportedn=Gn, f is statistical value, total data amount check that wherein n is data flow X points, and m is meter
The current data number used when calculation.
2. the flow calculation methodologies according to claim 1 based on Iterative statistical, it is characterised in that: in the step S1,
The data flow X is read in real time from memory pipeline, and data flow X is broken down into n sections according to time cycle property, is data
X1, X2... Xn。
3. the flow calculation methodologies according to claim 2 based on Iterative statistical, it is characterised in that: in the step S2,
The statistical formula G is most to be worth formula.
4. the flow calculation methodologies according to claim 2 based on Iterative statistical, it is characterised in that: the statistical formula G
For formula of variance.
5. the flow calculation methodologies according to claim 2 based on Iterative statistical, it is characterised in that: the statistical formula G
For standard deviation formula.
6. the flow calculation methodologies according to claim 2 based on Iterative statistical, it is characterised in that: the statistical formula G
It is expected formula.
7. the flow calculation methodologies according to claim 6 based on Iterative statistical, it is characterised in that: when the statistical formula
When G is desired formula, the statistical value f of final outputn=Gn=E (Xn), wherein
N is the data amount check of data flow X, f in formulanFor data
Flow the statistical value of all data in X, f(n-1)Except the statistical value after recent statistics value, that is, to end previous data in data flow
X(n-1)Statistical value, XnFor latest data value.
8. the flow calculation methodologies according to claim 7 based on Iterative statistical, it is characterised in that: in the step S3,
Current statistic value fm=Gm=E (Xm), wherein
M is the current data amount check of data flow X, f in formulamFor
The statistical value of current data, f in data flow X(m-1)Except the statistical value after current statistic value, that is, to end previous in data flow
Data X(m-1)Statistical value, XmFor Current data values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910745061.3A CN110489451A (en) | 2019-08-13 | 2019-08-13 | Flow calculation methodologies based on Iterative statistical |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910745061.3A CN110489451A (en) | 2019-08-13 | 2019-08-13 | Flow calculation methodologies based on Iterative statistical |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110489451A true CN110489451A (en) | 2019-11-22 |
Family
ID=68549721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910745061.3A Pending CN110489451A (en) | 2019-08-13 | 2019-08-13 | Flow calculation methodologies based on Iterative statistical |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489451A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8055970B1 (en) * | 2005-11-14 | 2011-11-08 | Raytheon Company | System and method for parallel processing of data integrity algorithms |
CN104267939A (en) * | 2014-09-17 | 2015-01-07 | 华为技术有限公司 | Business processing method, device and system |
CN104915247A (en) * | 2015-04-29 | 2015-09-16 | 上海瀚银信息技术有限公司 | Real time data calculation method and system |
CN108256045A (en) * | 2018-01-12 | 2018-07-06 | 福建星瑞格软件有限公司 | The structuring parsing of real-time streaming data, the method and computer equipment of stream calculation |
CN108804781A (en) * | 2018-05-25 | 2018-11-13 | 武汉大学 | The geographical process near real-time analogy method that stream calculation is integrated with Sensor Network |
CN109542946A (en) * | 2018-10-26 | 2019-03-29 | 贵州斯曼特信息技术开发有限责任公司 | It is a kind of to calculate big data system and method in real time |
CN109800129A (en) * | 2019-01-17 | 2019-05-24 | 青岛特锐德电气股份有限公司 | A kind of real-time stream calculation monitoring system and method for processing monitoring big data |
-
2019
- 2019-08-13 CN CN201910745061.3A patent/CN110489451A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8055970B1 (en) * | 2005-11-14 | 2011-11-08 | Raytheon Company | System and method for parallel processing of data integrity algorithms |
CN104267939A (en) * | 2014-09-17 | 2015-01-07 | 华为技术有限公司 | Business processing method, device and system |
CN104915247A (en) * | 2015-04-29 | 2015-09-16 | 上海瀚银信息技术有限公司 | Real time data calculation method and system |
CN108256045A (en) * | 2018-01-12 | 2018-07-06 | 福建星瑞格软件有限公司 | The structuring parsing of real-time streaming data, the method and computer equipment of stream calculation |
CN108804781A (en) * | 2018-05-25 | 2018-11-13 | 武汉大学 | The geographical process near real-time analogy method that stream calculation is integrated with Sensor Network |
CN109542946A (en) * | 2018-10-26 | 2019-03-29 | 贵州斯曼特信息技术开发有限责任公司 | It is a kind of to calculate big data system and method in real time |
CN109800129A (en) * | 2019-01-17 | 2019-05-24 | 青岛特锐德电气股份有限公司 | A kind of real-time stream calculation monitoring system and method for processing monitoring big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8276135B2 (en) | Profiling of software and circuit designs utilizing data operation analyses | |
Mandal et al. | Design of optimal linear phase FIR high pass filter using craziness based particle swarm optimization technique | |
US20110282866A1 (en) | System And Method For Retrieving And Processing Information From A Supervisory Control Manufacturing/Production Database | |
CN110187965B (en) | Operation optimization and data processing method and device of neural network and storage medium | |
Pintelon et al. | Frequency domain system identification with missing data | |
Metaxoglou et al. | Maximum likelihood estimation of VARMA models using a state‐space EM algorithm | |
CN107608870A (en) | A kind of statistical method and system of system resource utilization rate | |
CN110489451A (en) | Flow calculation methodologies based on Iterative statistical | |
CN110222402A (en) | Electrical design system and method | |
Li et al. | A novel self-similar traffic prediction method based on wavelet transform for satellite Internet | |
Mandal et al. | FIR band stop filter optimization by improved particle swarm optimization | |
CN106407272A (en) | Service statistical line display method and device | |
Mandal et al. | Design of optimal linear phase fir high pass filter using improved particle swarm optimization | |
WO2020189360A1 (en) | Pipeline computing apparatus, programmable logic controller, and pipeline processing execution method | |
Deryckere et al. | Online matching with set and concave delays | |
Chong et al. | Efficient extraction of high-betweenness vertices | |
Mukhopadhyay et al. | Optimal design of linear phase FIR band stop filter using particle swarm optimization with improved inertia weight technique | |
Bordoloi et al. | Design space exploration of instruction set customizable MPSoCs for multimedia applications | |
CN110635780A (en) | Variable-rate baseband pulse shaping filter implementation method based on FPGA and filter | |
Boccadoro et al. | A modelling approach for the dynamic scheduling problem of manufacturing systems with non negligible setup times and finite buffers | |
CN109143017B (en) | Production test data processing method for semiconductor industry | |
CN113343064B (en) | Data processing method, apparatus, device, storage medium, and computer program product | |
CN111754036B (en) | Cantilever pre-batching nesting method, processing device and terminal equipment | |
RU2681694C1 (en) | Method of constructing physical structure of user terminal of info-communication system | |
CN110941541B (en) | Method and device for problem grading of data stream service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191122 |
|
RJ01 | Rejection of invention patent application after publication |