CN106407161A - Distributed calculating method of standard deviation - Google Patents

Distributed calculating method of standard deviation Download PDF

Info

Publication number
CN106407161A
CN106407161A CN201611032295.6A CN201611032295A CN106407161A CN 106407161 A CN106407161 A CN 106407161A CN 201611032295 A CN201611032295 A CN 201611032295A CN 106407161 A CN106407161 A CN 106407161A
Authority
CN
China
Prior art keywords
standard deviation
overall
data
local
calculate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611032295.6A
Other languages
Chinese (zh)
Inventor
卓颋
刘洪明
殷荣华
高海军
何涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Joan Beijing Innovation Technology Co Ltd
Chongqing University of Post and Telecommunications
Original Assignee
Joan Beijing Innovation Technology Co Ltd
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Joan Beijing Innovation Technology Co Ltd, Chongqing University of Post and Telecommunications filed Critical Joan Beijing Innovation Technology Co Ltd
Priority to CN201611032295.6A priority Critical patent/CN106407161A/en
Publication of CN106407161A publication Critical patent/CN106407161A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Finance (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a distributed calculating method of standard deviation. The distributed calculating method comprises the following steps: 1) inputting each partial totality Pi; 2) calculating the mean value [mu]i and standard deviation STD.Pi of each partial totality Pi and the data number ni of the partial totality; 3) calculating the global mean value of collected data according to a formula; and 4) using the formula to calculate the global standard deviation. According to the distributed calculating method of standard deviation disclosed by the invention, the global standard deviation can be calculated as long as the mean value, standard deviation and umber of the partial totality are known; and through the method, the calculated amount is obviously decreased, due to the fact the dispersedly memorized partial totality is not need to be read frequently, a large amount of inquiry access time is saved, and the actual calculation efficiency is greatly improved.

Description

The distributed computing method of standard deviation
Technical field
The present invention relates to standard deviation computing technique field, particularly to a kind of distributed computing method of standard deviation.
Background technology
Standard deviation is defined as:Overall constituent parts standard value and the arithmetical average of its average deviation square square Root.In statistics, standard deviation is usually used to measure the difference size of one group of numerical value and degree of scatter, and standard deviation is bigger, represents Between most of numerical value and its meansigma methods, difference is bigger, such as in physicses, when doing repetition measurement, measured value set Standard deviation represent these measurement degree of accuracy.Mainly there is following several method obtaining standard deviation in prior art:
First, the sampling calculation method of standard deviation, extracts certain sample to conceptual data, and carries out sample mark to sample The calculating of quasi- difference, in order to replace overall standard deviation.
But sampling approach has sampling biass, especially in the environment of big data, this deviation can become apparent from.
2nd, the Traditional calculating methods of population standard deviation:
According to the definition of standard deviation, standard deviation be each data respectively with the difference of average square and average flat Root, wherein
The computing formula of mean μ:
The computing formula of standard deviation sigma:
Formula (3) can be derived by formula (2), it pushes over process and omits;
In the environment of big data, the amount of calculation of traditional standard difference computational methods is very big, operates unrealistic.
3rd, the iterative calculation method of standard deviation:
Fashionable when there being new data to enter, the Traditional calculating methods of standard deviation want the original all data values of re invocation with newly Increase data to come together to calculate new standard deviation, for this problem, there has been proposed the iterative calculation method of standard deviation:
Assume there is a seasonal effect in time series data:
x1,x2,x3,x4,...,xn,xn+1,...
In time point n, obtain data xn, and in time point n+1, obtain data xn+1.Whenever a new data flows into When it is necessary to calculate the standard deviation of n number including this new data in the time window of an a length of n.
Its committed step is as follows:Calculate first
Then, overall and X when calculating a newly-increased data by way of iterationn+1And standard deviation STD.Sn+1
Xn+1=Xn+xn+1-x1(6)
Formula (6) iteratively calculates the summation of data in the window of an a length of n, formula (7) iteratively calculate one long Standard deviation for data in the window of n.
By denominator (n-1) is replaced by n, obtain the iterative calculation method of population standard deviation:
For the iterative calculation method of the population standard deviation of flow data, simple and Convenient Calculation can be carried out to newly-increased data, but work as There is new data to enter fashionable, still need to again all data be calculated, cause computing redundancy.
3rd, the incremental calculation method of standard deviation
The technical problem computationally intensive in order to solve traditional standard difference, people also proposed the incremental computations side of standard deviation Method:
The method pushes over out following two relational expressions first on the basis of formula (1):
xnn-1=n (μnn-1) (9)
And due to:
Thus, push over out in conjunction with formula (9), (10):
Sn=Sn-1+(xnn-1)(xnn) (12)
Then obtain:
Standard deviation incremental calculation method only needs to according to the standard deviation of conceptual data and variance and single newly-increased data before, Just newly overall standard deviation can be calculated.But when in the face of the big data of distributed storage, need other distributed storage Each value during local is overall, as subsequent delta, substitutes into one by one and calculates, and can not directly utilize each local totally existing Average and standard deviation, computational efficiency is not still high.
In summary, the method for the traditional calculations standard deviation according to standard deviation definition needs the deviation from average of each data value Square calculate, computationally intensive when data volume is a lot, when have new data enter fashionable it is necessary to recalculate overall average and new Sum of sguares of deviation from mean, therefore there is redundancy in its calculating.Though the incremental calculation method of existing standard deviation is all in the past without access Input data thus make use of known condition, but if the data bulk inputting afterwards than larger when, will enter afterwards Each data value carry out incremental computations one by one, then its amount of calculation nor substantially reduced.
Content of the invention
In view of this, it is an object of the invention to provide a kind of distributed computing method of standard deviation, only it is to be understood that each local Overall average, standard deviation and number, just can calculate overall standard deviation, thus solve existing standard difference computational methods calculating Measure big technical problem.
The distributed computing method of standard deviation of the present invention, comprises the following steps:
1) input the overall P in each locali
2) calculate the overall P in each localiMean μi, standard deviation sigmai, and the overall data amount check n in locali
3) according to formulaCalculate the overall average of input;
4) utilize formulaCalculate defeated Enter overall standard deviation.
Beneficial effects of the present invention:
The distributed computing method of standard deviation of the present invention, only it is to be understood that the average of each local data, standard deviation and number, just The standard deviation of conceptual data can be calculated;This method makes amount of calculation substantially reduce, and due to reading each dispersion storage without frequent The all data deposited, save the substantial amounts of queried access time, and Practical Calculation efficiency has bigger raising.
Brief description
Fig. 1 is the flow chart of the distributed computing method of standard deviation of the present invention;
Fig. 2 is the computation model figure of the distributed computing method of standard deviation of the present invention.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
The distributed computing method of the present embodiment standard deviation, comprises the following steps:
1) input the overall P in each locali
2) calculate the overall P in each localiMean μi, standard deviation sigmai, and the overall data amount check n in locali
3) according to formulaCalculate the overall average of input;
4) utilize formulaCalculate defeated Enter overall standard deviation.The overall standard deviation sigma in each localiCan be using the Traditional calculating methods of the standard deviation described in background technology Or the incremental calculation method of standard deviation obtains.
Below by instantiation by the Traditional calculating methods of the distributed computing method of standard deviation and standard deviation, standard deviation Iterative calculation method and the incremental calculation method of standard deviation contrasted in complexity of the calculation, to prove the present invention The superiority of the distributed computing method of standard deviation.
First input each local overall:
The overall P in local1:{4,16,14,13,16,-7,-3,16,10,-19,1,-6,9,-4,17,12,3,8,18,9}
The overall P in local2:{-3,-12,3,4,7,13,-15,16,-15,19}
The overall P in local3:{-18,-7,17,-18,-6,-13,-2,-18,-2,-12,10,0,10,9,20}
Calculate the overall P in each local of inputiMean μi, standard deviation sigmai, data amount check niFor:
Each local totally PiMean μiIt is respectively:μ1=6.35, μ2=1.7, μ3=-2
Each local totally PiStandard deviation sigmaiIt is respectively:σ1=9.763580286, σ2=11.97539143, σ3= 12.35043859
Each local totally PiData amount check niIt is respectively:n1=20, n2=10, n3=15.
Relatively one:Calculate the overall P in local by the Traditional calculating methods of standard deviation1, the overall P in local2P overall with local3This The overall standard deviation of three
Overall data total number is:nt=n1+n2+n3=45, this step includes 2 additions.
Calculate overall average:
This step includes 44 additions, 1 division.
Calculate overall standard deviation:
This step need to carry out 45 multiplication or square, 1 division, 44 additions, 45 subtractions, 1 extracting operation.
Understand, when calculating standard deviation with traditional computational methods, need 45 multiplication altogether, 2 divisions, 90 additions, 45 Subtraction, 1 extracting operation.
Relatively two:The overall P in local is calculated by the iterative calculation method of standard deviation1, the overall P in local2P overall with local3This The overall standard deviation of three
The overall P in known local1Standard deviation sigma1=9.763580286, if the length of data window is the most number of data volume Length 20 according to block.
Calculate the sum of front 20 numbers according to formula (4):
This step includes 19 additions.
According to formula (6) calculate the rear n number after newly-increased 21st data and:
X21=X20+x21-x1=127+ (- 3) -4=120
This step includes 1 addition, 1 subtraction.
Overall standard deviation is calculated according to formula (8):
This step include altogether 3 multiplication or square, 2 divisions, 3 additions, 2 subtractions, 1 evolution.
When newly-increased 1 data value, the method for iteration needs to carry out 3 multiplication, 2 divisions, 23 additions altogether, and 3 subtract Method and 1 extracting operation.
The total data entering below is calculated by above step successively, draw overall standard deviation be σ= 13.37310734.
In the overall P in known local1Average and standard deviation in the case of, calculate overall standard with the computational methods of iteration Difference needs to carry out 75 multiplication, 50 divisions, 594 additions, 75 subtractions, 25 extracting operations altogether.
The time complexity of this algorithm is related to the data volume in data block, is O (n-a), and n is overall data amount check, Constant a is the data amount check in first data block.Only during a newly-increased data, this algorithm is compared traditional computational methods and is had Advantage, but when newly-increased data volume is very big, amount of calculation with n proportional relationship, be even more than the amount of calculation of traditional method. In addition, having differences between the result of calculation of the method and correct result, only as approximate calculation method.
Relatively three:The overall P in local is calculated by the incremental calculation method of standard deviation1, the overall P in local2P overall with local3This The overall standard deviation of three
The overall P in known local1Mean μ1=6.35, standard deviation sigma1=9.763580286, data amount check n1=20,
According to formula (11), calculate the overall P in local1Sum of sguares of deviation from mean value
This step includes 2 multiplication.
Calculate the meansigma methodss of newly-increased 21st data
This step needs to carry out 2 multiplication, 1 division, 2 additive operations.
According to formula (12), calculate newly-increased 21st number according to this after sum of sguares of deviation from mean
S21=S20+((-3)-μ1)((-3)-μ21)=1989.809524
This step includes 1 multiplication, 1 addition, 2 subtractions.
According to formula (13), calculate:
This step includes 1 division, 1 evolution.
When newly-increased 1 data value, include 5 multiplication, 2 divisions, 3 additions, 2 subtractions, 1 extracting operation altogether.
The total data entering below is brought into above step successively calculated, draw overall standard deviation sigma= 11.77115118.
In the overall P in known local1Average and standard deviation in the case of, calculating standard deviation with the computational methods of increment needs altogether Calculate 125 multiplication, 50 divisions, 75 additions, 50 subtractions, 25 evolutions.
The result that the method calculates is error free with accurate result.Fashionable when there being single new data to enter, can make full use of Known conditions, reduces computing redundancy.It can be seen that when newly-increased data volume increases, amount of calculation is in that multiple increases, may Exceed the amount of calculation needed for traditional calculations, but fewer than the amount of calculation needed for the computational methods of iteration.The incremental computations of standard deviation The time complexity of algorithm to overall in data volume related, be O (n-a), n is overall data amount check amount, constant a is first Data amount check during individual local is overall.
Relatively four:The overall P in local is calculated by the distributed computing method of standard deviation1, the overall P in local2P overall with local3 The overall standard deviation of this three
According to formula:
Calculate overall mean μt, include 3 multiplication, 1 division, 4 additions for this step.
Using distributed standards difference algorithm Calculate overall standard deviation:
This step includes 12 multiplication, 9 divisions, 14 additions, 9 subtractions, 1 evolution.
The distribution calculation method of standard deviation is brought in above-mentioned data and calculates, altogether need to calculate 15 multiplication, 10 Division, 18 additions, 9 subtractions, 1 evolution.
The result that this algorithm calculates is accurate.When knowing the overall average in each local and standard deviation, can be easy Calculate overall standard deviation, be sufficiently used the known conditions of each data block, so that computational efficiency is greatly improved. The computation complexity of the method is unrelated with data amount check, and only the number overall with local is relevant.The time complexity of this algorithm is O L (), constant l is the overall number in local.
Knowable to calculation procedure required for from above-mentioned various standard deviation computational methods, the incremental computations side of standard deviation of the present invention Method makes amount of calculation substantially reduce, with the obvious advantage, and due to without the frequent all data reading each dispersion storage, saving a large amount of The queried access time, Practical Calculation efficiency has bigger raising.
The distributed computing method of the present embodiment standard deviation be used for stock market stability analyses example is presented herein below.
The fluctuation of stock price is the performance of stock market risk, and therefore stock market risk analyses are exactly to stock market Price fluctuation is analyzed.Undulatory property represents the uncertainty of future price value, this uncertain typically use variance or Standard deviation is portraying.Table 1 is the stock statistical indicator of China and U.S. part period.
Table 1:Upper card and Standard & Poor's Index
Can be obtained by calculating:
Index of Shanghai Stock Exchange achievement expected value
=(1144.08+1686.75+4328.92+2912.42+2736.50+2795.42+2639.19+ 2211.11+ 2182.53+2279.74)/10≈2491.6660
Upper card stability bandwidth expected value ≈ 0.3323
Standard & Poor achievement expected value ≈ 1356.2570
Standard & Poor stability bandwidth expected value ≈ 0.17118
And the computing formula of standard deviation then calculates according to the formula (12) in background technology:
The performance dimension difference ≈ 800.5983 of Index of Shanghai Stock Exchange
Upper card stability bandwidth standard deviation ≈ 0.1032
Standard & Poor's Index performance dimension difference ≈ 267.4948
Standard & Poor stability bandwidth standard deviation ≈ 0.0736
Because standard deviation is absolute value it is impossible to directly be contrasted to Sino-U.S. by standard deviation, and the coefficient of variation can be straight Connect and compare.Can be calculated:
Upper card achievement coefficient of variation ≈ 800.5983/2491.6660 ≈ 0.3213
Upper card stability bandwidth coefficient of variation ≈ 0.1032/0.3323 ≈ 0.3105
Standard & Poor achievement coefficient of variation ≈ 267.4948/1356.2570 ≈ 0.1972
Standard & Poor stability bandwidth coefficient of variation ≈ 0.0736/0.17118 ≈ 0.4301
By comparing it can be seen that the upper card stability bandwidth coefficient of variation is greater than the Standard & Poor stability bandwidth coefficient of variation, illustrate to grow For phase, China Stock Markets's stability is relatively poor, or not overripened stock market.
Finally illustrate, above example only in order to technical scheme to be described and unrestricted, although with reference to relatively Good embodiment has been described in detail to the present invention, it will be understood by those within the art that, can be to the skill of the present invention Art scheme is modified or equivalent, the objective without deviating from technical solution of the present invention and scope, and it all should be covered at this In the middle of the right of invention.

Claims (1)

1. standard deviation distributed computing method it is characterised in that:Comprise the following steps:
1) input the overall P in each locali
2) calculate the overall P in each localiMean μi, standard deviation sigmai, and the overall data amount check n in locali
3) according to formulaCalculate the overall average of input;
4) utilize formulaCalculate input total The standard deviation of body.
CN201611032295.6A 2016-11-22 2016-11-22 Distributed calculating method of standard deviation Pending CN106407161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611032295.6A CN106407161A (en) 2016-11-22 2016-11-22 Distributed calculating method of standard deviation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611032295.6A CN106407161A (en) 2016-11-22 2016-11-22 Distributed calculating method of standard deviation

Publications (1)

Publication Number Publication Date
CN106407161A true CN106407161A (en) 2017-02-15

Family

ID=58082769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611032295.6A Pending CN106407161A (en) 2016-11-22 2016-11-22 Distributed calculating method of standard deviation

Country Status (1)

Country Link
CN (1) CN106407161A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109100264A (en) * 2018-10-22 2018-12-28 云南中烟工业有限责任公司 A kind of method of quick predict ramuscule cigarette smoking uniformity
CN109341544A (en) * 2018-11-15 2019-02-15 上海航天精密机械研究所 A kind of laser displacement sensor ranging numerical optimization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914870A (en) * 2014-02-28 2014-07-09 天津工业大学 High-universality automatic hologram reestablishing method based on new focus evaluation function
CN104636318A (en) * 2015-02-15 2015-05-20 杭州邦盛金融信息技术有限公司 Distributed or increment calculation method of big data variance and standard deviation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914870A (en) * 2014-02-28 2014-07-09 天津工业大学 High-universality automatic hologram reestablishing method based on new focus evaluation function
CN104636318A (en) * 2015-02-15 2015-05-20 杭州邦盛金融信息技术有限公司 Distributed or increment calculation method of big data variance and standard deviation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109100264A (en) * 2018-10-22 2018-12-28 云南中烟工业有限责任公司 A kind of method of quick predict ramuscule cigarette smoking uniformity
CN109100264B (en) * 2018-10-22 2020-11-17 云南中烟工业有限责任公司 Method for rapidly predicting fine cigarette smoking uniformity
CN109341544A (en) * 2018-11-15 2019-02-15 上海航天精密机械研究所 A kind of laser displacement sensor ranging numerical optimization

Similar Documents

Publication Publication Date Title
Laird Missing data in longitudinal studies
Waugh Inversion of the Leontief matrix by power series
Bresler et al. Unsaturated flow in spatially variable fields: 2. Application of water flow models to various fields
Glass A technique for fitting nonlinear models to biological data
Hunter The computation of key properties of Markov chains via perturbations
CN106407161A (en) Distributed calculating method of standard deviation
Sun et al. Optimal portfolio strategy with cross-correlation matrix composed by DCCA coefficients: Evidence from the Chinese stock market
Pham-Gia Exact distribution of the generalized Wilks’s statistic and applications
Hult et al. On importance sampling with mixtures for random walks with heavy tails
CN111124489A (en) Software function point estimation method based on BP neural network
Man et al. Aggregation effect and forecasting temporal aggregates of long memory processes
Feng et al. Geometric Brownian motion with affine drift and its time-integral
Wang Dimension reduction in partly linear error-in-response models with validation data
Davidov et al. Improving an estimator of Hsieh and Turnbull for the binormal ROC curve
Liou More on the computation of higher-order derivatives of the elementary symmetric functions in the Rasch model
CN110019161A (en) Abnormal data cleaning method based on information entropy theory
Ducey et al. Accounting for bias and uncertainty in nonlinear stand density indices
Bapat et al. On an inflated Unit-Lindley distribution
CN111914475A (en) Bayesian inverse simulation method for accelerating depicting Gaussian hydrogeological parameter field
Lee et al. Optimal weighting systems for direct age‐adjustment of vital rates
CN110659768B (en) Academic influence evaluation and prediction method for data publications
Gai et al. Statistical inference on partial linear additive models with distortion measurement errors
Lauder Direct kernel assessment of diagnostic probabilities
Schucany et al. Jackknifing R-estimators
CN115291528B (en) Model uncertainty grade determination method, device and system and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170215