CN101131692A - Hierarchical statistics-probability calculation formula seeking algorithm - Google Patents

Hierarchical statistics-probability calculation formula seeking algorithm Download PDF

Info

Publication number
CN101131692A
CN101131692A CNA2006100321351A CN200610032135A CN101131692A CN 101131692 A CN101131692 A CN 101131692A CN A2006100321351 A CNA2006100321351 A CN A2006100321351A CN 200610032135 A CN200610032135 A CN 200610032135A CN 101131692 A CN101131692 A CN 101131692A
Authority
CN
China
Prior art keywords
level
probability
element number
classification
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006100321351A
Other languages
Chinese (zh)
Inventor
陈启星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CNA2006100321351A priority Critical patent/CN101131692A/en
Publication of CN101131692A publication Critical patent/CN101131692A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

Forming order array A(1)-A(n) after ordering certain distribution( better smooth distribution), dividing the total distribution interval to m subintervals during the pro-process phase. Every subinterval is a rectangle trapezium with the bevel edge on the top, and the trapezium's areas reflects the amount of the elements; then constructing a registering array B(0)-B(m) to register the attributes of all levels trapezium(including starting value, the in-level element amount, forward-level element amount, probability density of the level, probability slope rate and so on). First, computing the cut-off point of every level--the starting value, then using the graduated mapping method to computing and counting up the levels every element of A(1)-A(n) belongs to one by one, getting the level-in element amount of every level, then computing the forward-level element amount, the probability density of the level, and the probability slope rate, which are putted into the registering array B, and completing the handling work. During the lookup phase, as for the every waiting for looking up amount XY, computing to determine the subinterval r level XY belongs to by graduated mapping, and then determining the forward-level element amount, by predicting computation correcting computation the probability of the subinterval of trapezium, to determine the level-in exact predicting measure of the XY in the subinterval, adding it and the forward-level element account to makes the exact predicting element.

Description

Hierarchical statistics-probability calculation formula seeking algorithm
Technical field: the invention belongs to computerized algorithm and field of data structures.
Background technology: present searches in the algorithm, and high efficiency searching comprises that binary search, Fibonacci are searched, interpolation is searched, and this all is that whole process is searched; Hash lookup is that calculating formula is searched, but what is called " conflict " phenomenon is arranged, so there is not universality.There is not a kind of algorithm of searching to realize the reliable calculating formula of any ordered list is searched at present.
The objective of the invention is: the algorithm of searching that proposes a kind of calculating formula, be " hierarchical statistics-probability calculation formula seeking " algorithm, this algorithm has 3 features: 1., time complexity is O (1), and for the ordered list of very long smooth distribution, usually as long as 3 times can be hit with interior searching; 2., additional space is little, complexity is O (1), additional space is an array only of a record type (comprising 4-5 data item), hundreds of approximately numbers are according to element; 3., can search the table of any-mode ordering.
Invention application content
Total thinking of algorithm is: formed orderly array A (1)~A (n) after sorting for certain distribution (preferably smooth distribution), it is searched be divided into pretreatment stage and searching the stage, pretreatment stage is for to be divided into m sub-range (m+1 rank arranged) with its total interval that distributes, each sub-range all is the right-angled trapezium (hereinafter to be referred as trapezoidal) that a top is a hypotenuse, and what of " element number " are the size of trapezoidal area reflected; Construct record array B (0)~B (m) then and be used to register trapezoidal attributes at different levels (comprising initial value, level interior element number, the preceding element number of level, level probability density, probability slope).At first be to calculate separation at different levels i.e. " initial value ", use " hierarchy mapping " to calculate the rank under A (1)~each element of A (n) one by one and add up again, obtain " level interior element number " at different levels, calculate " element number before the level ", " level probability density ", " probability slope " then respectively, all register among the record array B, finish pre-service work.The stage of searching is to any several XY to be found, can calculate by hierarchy mapping and determine sub-range, XY place " r level ", thereby determine " element number before the level ", calculate and the probability corrected Calculation by trapezoidal sub-range being carried out probabilistic forecasting again, determine " level in the smart premeasuring " of XY, obtain " smart predicting unit " with " element number before the level " addition in the sub-range.
Description of drawings:
The probability calculation formula seeking algorithm synoptic diagram of Fig. 1---trapezoidal profile.X1---minimum value; Xn---maximal value; The intermediate value of Xz---X1 and Xn; The intermediate value of Xa---X1 and Xz; The intermediate value of Xb---Xz and Xn; XY---number to be found; Be consistent in the literary composition, PX1, PXa, PXz, PXb, PXn, PXY are respectively X1, Xa, Xz, Xb, the pairing probability of Xn, XY.
The hierarchical statistics-probability calculation formula seeking algorithm synoptic diagram that Fig. 2---arbitrary smooth distributes.B (0), B (1), B (2), B (r-1), B (r), B (r+1), B (n-1), B (n), represent B (0) .Xmin, B (1) .Xmin, B (2) .Xmin, B (r-1) .Xmin, B (r) .Xmin, B (r+1) .Xmin, B (n-1) .Xmin, B (n) .Xmin, B (r+1) .Xmin (this is because the relation under writing not on the figure) respectively; XY represents number to be looked into.All general in these symbol description books, claims.
Embodiment 1: the probability calculation formula seeking algorithm of trapezoidal profile
Start with from analyzing a table, introduce " probability calculation formula seeking " algorithm by (Fig. 1) right-angled trapezium probability distribution.
Supposing has one its pairing numerical value in each unit is X1~Xn (X1 is a minimum value as can be known, and Xn is a maximal value) by the ordered list (ascending order) that constitutes of array A (1)~A (n) only, and the probability distribution of X1~Xn is linear (Fig. 1).(X1≤XY≤Xn) in (A (1)~A (n)) which unit just can be by obtaining answer by probability calculation if search XY.
Thinking is analyzed:
Because the area correspondence of probability distribution element number, obtained area and obtained element number exactly.Trapezoidal area Sn correspondence n data (making Sn:=n) between we known (Fig. 1) X1~Xn, and which unit are several XY so to be found in?
Trapezoidal area is Sx between X1~XY if make, and littler than XY so data have Sx.Make K:=round (Sx), just say and can determine from the angle of probability, XY should be positioned at K unit, and XY pairing " thick predicting unit " should be A (K).
Certainly, " thick predicting unit " and " waiting to look into several XY " normally have certain error, and make " thick predicting unit " and the error of " number to be looked into " is dX, and calculates the pairing area dS of dX, so dK:=round (dS) is exactly a corrected range.
Ask the algorithm of " thick predicting unit ".Be following convenience of calculation, the present invention equals n when arranging distribution function trend+∞, rather than the custom equal 1, so this paper element number density is exactly probability density.Below arrange PX1, PXz, PXa, PXb, pxy and be respectively the probability density at X1, Xz, Xa, Xb, XY place, can be rewritten as " element number density ".
1, as shown in Figure 1, the expression formula of Sx be found, these two probability density of PX1, pxy (PX1 is a upper base, and pxy is for going to the bottom, and dX is high) will be known earlier.For this reason, make Xz be positioned at the mid point of X1~Xn.The probability density of Xz is
PXz:=n/(Xn-X1);
If 2 try to achieve slope G again, just PX1, pxy can have been obtained.Be one of thinking of asking slope G below.By statistics, the element number of X1~Xz is Ka, and so, the probability density of the mid point Xa of X1~Xz is
PXa:=Ka/(Xz-X1);
Equally, the element number of statistics Xz~Xn is Kb, and the probability density of the mid point Xb of Xz~Xn is
PXb:=Kb/(Xn-Xz);
So trapezoidal slope G is
G:=(PXb-PXa)/(Xb-Xa)
Because (Xn-Xz)=(Xz-X1)=(Xb-Xa), below unifying to make them is TX, so
G:=(Kb-Ka)/(TX*TX)
3, obtain PX1, pxy by G and PXz again.
PX1:=PXz-G*TX;
pxy:=PX1+G*(XY-X1);
4, determine trapezoidal area Sx between X1~XY (being element number),
Sx:=(PX1+pxy)*(XY-X1)/2;
5, K:=round (Sx), from the angle of probability, A (K) is the thick predicting unit of XY, and A (K) ≈ XY should be arranged; Error amount is dX:=XY-A (k).
It is multiple to ask the algorithm of " smart predicting unit " to have, and this paper proposes following three kinds.
First kind of algorithm, error amount is converted to the algorithm of span.Calculate " revising area dS ", i.e. area dS between XX~XdX, will pay particular attention to XX here is a dynamic value, and XX is overlapping with XY when revising for the first time, and later XX just constantly is updated to XX+dX.Pxy, pxx, PXdX, be respectively the probability at XY, XX, XX+dX place.
PXdX:=pxy; { initialize is prepared for entering circulation }
Pxx and pxy were overlapping when pxx:=pxy{ revised in the first time }
WHILE ABS (dX)>KK* (1/PXdX) THEN{ illustrates 1}
[pxx:=PXdX; { 2} is described
PXdX:=PXdX+G*dX;
dS:=(pxx+PXdX)*dX/2
dK:=round(dS);
K:=K+dK;
dX:=XY-A(K);】
{ after withdrawing from circulation,, handling three kinds of situations of dX respectively } according to the size of dX
IF dX=O THEN means and has found XY, at K unit.{ first kind }
ELSE
[IF ABS (dX)<3* (1/PXdX) THEN[sequential search XY] { second kind, when error less than 3 unit }
ELSE[binary search XY] { the third, when error at 3~KK unit }
Illustrate 1: the correction number of times is said so uncertain theoretically, thus to use circular treatment, and to consider the round-robin problem that withdraws from.We should withdraw from circulation apart from XY during less than the individual array location of KK (such as KK=7) at A (K), because at this moment to search algorithm (sequential search or binary search) faster with traditional.(1/PXdX) be a unit pairing X span value,, represent only need withdraw from and circulate with interior in KK unit from target XY when error distance is ABS (dX)≤KK* (1/PXdX).
Illustrate 2: calculate " revising area dS ", pxx is a upper base, and PXdX is for going to the bottom, and dX is high.Because when calculating " correction area ", XX has advanced to XX+dX, so upper base has become pxx:=PXdX next time; Same reason, going to the bottom has become PXdX:=PXdX+G*dX.
Second kind of algorithm, the algorithm of adjustment input value.To ask thick predicting unit A (k) and refinement predicting unit to merge in the same algorithm.
XX:=XY; {, number to be found is saved as XX} for the programming meter
pxx:=PX1+G*(XX-X1);
WHILE ABS (dX)>KK* (1/pxx) THEN{ reason is with explanation 1}
【Sx=(PX1+pxx)*(XX-X1)/2
K:=round (Sx); { K is the place unit number of XX }
dX:=XY-A(K);
dK:=round(dX);
XX:=XX+dX;
pxx:=PX1+G*(XX-X1);
The third algorithm, binary search in fiducial interval.In fact, this paper does not prove that preceding two kinds of algorithms can determine (finding or announcing does not have) XY definitely reliably, but binary search can be determined XY definitely reliably.Through calculating thick predicting unit is A (K), through obtaining revising area after the dX correction is dS, dK:=round (dS), dK is a corrected range, A (K+dK) should be exactly " the expectation unit " that most probable equals XY among A (1)~A (n), if the corrected range expansion is twice, the degree of confidence of XY in interval A (K)~A (K+2*dK) should be very high, so compare A (K+2*dK) and XY, if XY is in A (K)~A (K+2*dK) scope, just use binary search, if XY outside A (K)~A (K+2*dK) scope, just adjusts to scope A (K+2*dK)~A (K+4*dK) again.
Embodiment 2: the hierarchical statistics-probability calculation formula seeking algorithm that arbitrary smooth distributes
Has embodiment 1 only solved the problem of the probability calculation formula seeking of trapezoidal probability distribution, and what if distributes for arbitrary smooth?
Thinking is analyzed:
For the table that arbitrary smooth distributes, expect easily, total interval [X1, Xn] can be divided into m sub-range, make the probability distribution in each sub-range to handle by trapezoidal probability distribution.
Smooth distribution among Fig. 2 has been divided into m minizone, each minizone all is similar to a trapezoidal area, for trapezoidal profile, a last joint has proposed the solution of " probabilistic type is searched ", and this section key will solve the problem of " hierarchical algorithms ", and the requirement of algorithm is: for several XY to be found, need not be through comparing, but directly calculate the level at XY place, and then adopt probabilistic type to search at this grade, calculate the unit at XY place.
Suppose to have an orderly array A (1)~A (n), minimum value A (1)=X1, maximal value A (n)=Xn, hierarchical algorithms comprised for two megastages.
Phase one: pretreatment stage.Purpose is array A (1)~A (n) to be carried out " hierarchy mapping " calculate, and attribute at different levels calculated and register, attribute comprises the initial value at different levels, element number, level probability density (i.e. " r level initial value probability density "), probability slope etc.
1-1., a record of structure array registers attribute at different levels, registers attribute at different levels respectively and comprise " initial value ", " level probability density ", " r level probability slope " " level interior element number ", " element number before the level ".
As set up one and write down the additional space that array B (0)~B (m) handles as classification, (r=0~m) is the unit of r level to B (r), it comprises 5 data fields: (1), definition B (r) .Xmin are " initial value ", are used to write down the minimum value (real number type) of r level data; (2), definition B (r) .Nb1 is " when the prime element number " (integer type) of r level; (3), definition B (r) .Nb2 be " element number before the level " (integer), is that B (0) level is to B (r-1) grade of element number sum; (4), definition B (r) .px is " a level probability density "; (5), definition B (r) .G is " a r level probability slope ".
If save this unit of B (r) .Nb1 also can, when each calculating, replace B (r) .Nb1 get final product with B (r+1) .Nb2-B (r) .Nb2, the cost of still saving the space is to spend more the time.
Certain total interval of smooth distribution is divided into a lot of sub-ranges, and one by one element is carried out classification, and add up " initial value ", " the level interior element number " and " level before element number " of whole table in each sub-range by " hierarchy mapping ";
1-2., interval [X1, Xn] is divided into m minizone (be m+1 " level "), definition of T X is " a grade span ", TX:=(Xn-X1)/m;
Set up a circulation,, B (0)~B (m) initial value at different levels is recorded to B (r) .Xmin one by one by following B (r) .Xmin:=X1+r*TX formula at B (r).
FOR?r:=O?TO?m?DO?B(r).Xmin:=X1+r*TX
1-3., transplant the algorithm of " order of classification " (1), obtain the algorithm of " hierarchical statistics ".Set up a circulation, by " hierarchy mapping " A (j) is carried out classification and calculate, see which rank A (j) belongs at A (1)~A (n).Behind traversal A (1)~A (n), can count B (0) .Nb1~B (m) .Nb1.
FOR?j:=1?TO?n?DO
[uniform hierarchy mapping: r:=trunc (m* (A (k)-X1)/(Xn-X1))=trunc ((A (j)-X1)/TX)
B(r).Nb1:=B(r).Nb1+1
1-calculates B (0) .Nb2~B (m) .Nb2 4., again, i.e. each " element number before the level ".
B(0).Nb2:=0
FOR?r:=1?TO?m?DO?B(r).Nb2:=B(r-1).Nb2+B(r-1).Nb1;
Calculate " level probability density ", " r level probability slope " and " numerical digit to be looked into is put probability density " by " level interior element number ";
1-5., calculate probability density B (r) .px at each rank (at B (r) .Xmin place).Find out easily
FOR?r:=1?TO?m-1?DO?B(r).px:=(B(r-1).Nb1+B(r).Nb1)/(2*TX);
B (0) .px will calculate in addition, and B (m) .px does not then need.
1-6., calculate each level other probability slope B (r) .G, find out easily
FOR?r:=1?TO?m-1?DO B(r).G:=(B(r).Nb1-B(r-1).Nb1)/(TX*TX);
B (0) .G:=B (1) .G; { both should approximately equal }
1-⑦、B(0).px:=B(1).px-B(0).G*TX;
Subordinate phase: search the stage, divide four subs.Order is waited to look into number and is XY,
2-1., classification searches sub.By r:=trunc (XY-X1)/TX) formula, " hierarchy mapping " calculates the rank r that directly finds the XY place,
2-2., the probability calculation sub, by utilizing " level probability density ", " the r level probability slope " and " to be looked into numerical digit put probability density pxy " of number to be found at affiliated rank r, carrying out area calculates, obtain folded probability distribution area between " number to be looked into " and " initial value ", Here it is " thick premeasuring in the level ", and " thick premeasuring in the level " obtained " thick predicting unit " with " element number before the level " addition.
The probability density at definition XY place is pxy
pxy:=B(r).px+B(r).G*(XY-B(r).Xmin);
2-3., the area (being element number) of B (r) .Xmin~XY is Sx, claims " element number probability ".
Sx:=(B(r).px+pxy)*(XY-B(r).Xmin)/2;
2-④、K:=B(r).Nb2+round(Sx);
The thick predicting unit in the place of XY is A (K), is " element number before the level "+" element number probability ".
The error correction sub.After the error amount dX that calculates " thick predicting unit A (K) " and " waiting to look into several XY ", calculate the pairing trapezoidal area dS of dX, defining this trapezoidal upper base is pxx, goes to the bottom to be PXdX, height is dX.This is trapezoidal to be exactly " correction probable value ", after with " correction probable value " " thick premeasuring in the level " being revised, obtains " smart premeasuring in the level ", obtains " smart predicting unit " with " element number before the level " addition.
2-⑤、PXdX:=pxy;
Pxx and pxy were overlapping when pxx:=pxy{ revised in the first time }
WHILE?ABS(dX)>KK*(1/PXdX)THEN
【pxx:=PXdX;
PXdX:=PXdX+G*dX;
dS:=(pxx+PXdX)*dX/2
dK:=round(dS);
K:=K+dK;
dX:=XY-A(K);】
Determine the target phase, { handling three kinds of situations of dX respectively }
IF dX=O THEN means and has found XY, at K unit.{ first kind of situation }
ELSE
[IF ABS (dx)<3* (1/PXdX) THEN[sequential search XY] { second kind of situation, when error less than 3 unit }
ELSE[binary search XY] { the third situation, when error at 3~KK unit }
The factor analysis of the time that influence is searched
The time of searching is made of three parts: thick predicting unit calculating+corrected Calculation in level calculation+level.It is essential that level calculation and thick predicting unit are calculated, and corrected Calculation is to searching the time effects maximum, the corrected Calculation number of times that depends primarily on consuming time.So, the more little then corrected Calculation of error of calculation number of times is few more.Is the error of calculation by where coming? 1., obtain " element number before the level " after the level calculation, this is a statistical number, can not make a mistake.2., thick predicting unit is calculated in the level, no matter be to calculate thick predicting unit or computed correction, all obtain according to probability calculation, error may appear, the key that influences hit rate is the trapezoidal quality in sub-range, trapezoidal top is straight more, then probability calculation tallies with the actual situation more, hit rate is just high more, so thick predicting unit calculating hit rate depends on two factors in the level, one is the mild property of look-up table itself, and several XY to be found are between flat zone and then shoot straight, another divides progression m exactly, and trapezoidal top, the big more then sub-range of m is straight more.
So dividing progression m and the time of searching is a pair of contradiction, the big more then sub-range of m is trapezoidal good more, and easier hitting searched target; And crossing conference, m make space complexity excessive.Is it good dividing progression m much on earth? can increase by one at pretreatment stage and judge the rational link of m.When B (r) .G changed too greatly with respect to B (r-1) .G, the expression hierarchical level was not enough.Make D as classification rationality assessed value, recycle ratio, as long as have | B (r) .G-B (r-1) .G|>D situation occurs, just the m expansion is twice, as long as make m be in a reasonable value, just can be so that corrected Calculation gets final product hit at twice with interior.
Embodiment 3: algorithm is searched in classification positioning sorting and classification location
Comprise 4 stages:
1., the order of classification stage.Existed an array X (j) (j=O, 1 ..., n) wait for ordering, wherein max and min are respectively maximal value and minimum value.For this reason, construct an array linked list C (r) as the stepping array, stepping number (r=O, 1,, m), C (r) comprises that a pointer field C (r) ^.link is used for linking the level interior element number that the unit, an integer field C (r) ^.Nb1 that belong to the r level among the X (j) are used for writing down the r level.(m* (x (j)-min)/(max-min)) carries out classification, and is linked among the corresponding rank with hierarchy mapping r:=trunc to X (j) one by one.After traveling through once to X (j), C (r) ^.link has linked each unit that belongs to the r level among the X (j), and C (r) ^.Nb1 has write down the level interior element number of r level.
2., the positioning sorting stage.Comprise two work: one is step by step the data of C (r) ^.link link to be carried out grade internal sort and collect among array D (O)~D (n), becomes ordered list; Another is to set up one to set up the additional space that record array E (O)~E (m) handles as classification, (r=O~m) is the unit of r level, and it comprises 2 data fields: definition E (r) .Nb1 is " opening the beginning position " (integer type) in D (O)~D (n) of r level to E (r); Definition E (r) .Nb2 is " end position " in D (O)~D (n) of r level.
3., the stage is searched in classification.To several XY to be found, earlier with hierarchy mapping r:=trunc (m*XY-min)/(max-min)) carry out classification calculating, calculate the rank r that it belongs to, known XY " opening the beginning position " E (r) .Nb1 and " end position " E (r) .Nb2 in D (O)~D (n) then.
4., search the stage location.Because the array of hierarchical statistics and order of classification can determine arbitrary level and else " open the beginning position " and " end position ", efficiently search (comprise binary search, Fibonacci are searched, interpolation search) so can carry out tradition to arbitrary rank.

Claims (11)

1. hierarchical statistics-probability calculation formula seeking algorithm, it is characterized in that: formed orderly array A (1)~A (n) after sorting for certain distribution (preferably smooth distribution), it is searched be divided into pretreatment stage and searching the stage, pretreatment stage is for to be divided into m sub-range (m+1 rank arranged) with its total interval that distributes, each sub-range all is the right-angled trapezium (hereinafter to be referred as trapezoidal) that a top is a hypotenuse, and what of " element number " are the size of trapezoidal area reflected; Construct record array B (0)~B (m) then and be used to register trapezoidal attributes at different levels (comprising initial value, level interior element number, the preceding element number of level, level probability density, probability slope).At first be to calculate separation at different levels i.e. " initial value ", use " hierarchy mapping " to calculate the rank under A (1)~each element of A (n) one by one and add up again, obtain " level interior element number " at different levels, calculate " element number before the level ", " level probability density ", " probability slope " then respectively, all register among the record array B, finish pre-service work.The stage of searching is to any several XY to be found, can calculate by hierarchy mapping and determine sub-range, XY place " r level " (thereby determining " element number before the level "), calculate and the probability corrected Calculation by trapezoidal sub-range being carried out probabilistic forecasting again, determine " level in the smart premeasuring " of XY, obtain " smart predicting unit " with " element number before the level " addition in the sub-range.
2. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: construct a record array attribute at different levels is registered, register attribute at different levels respectively and comprise " initial value ", " level probability density ", " r level probability slope " " level interior element number ", " element number before the level ".
3. as setting up the additional space that record array B (0)~B (m) handles as classification, (r=0~m) is the unit of r level to B (r), it comprises 5 data fields: (1), definition B (r) .Xmin are " initial value ", are used to write down the minimum value (real number type) of r level data; (2), definition B (r) .Nb1 is " when the prime element number " (integer type) of r level; (3), definition B (r) .Nb2 be " element number before the level " (integer type), is that B (0) level is to B (r-1) grade of element number sum; (4), definition B (r) .px is " a level probability density "; (5), definition B (r) .G is " a r level probability slope ";
If save this unit of B (r) .Nb1 also can, when each calculating, replace B (r) .Nb1 get final product with B (r+1) .Nb2-B (r) .Nb2, the cost of still saving the space is to spend more the time.
4. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: certain total interval of smooth distribution is divided into a lot of sub-ranges, and one by one element is carried out classification, and add up " initial value ", " the level interior element number " and " grade before element number " of whole table in each sub-range by " hierarchy mapping ";
Hierarchy mapping is asked " initial value " at different levels:
Interval [X1, Xn] is divided into m minizone (being m+1 " level "), and definition of T X is " a level span ",
TX:=(Xn-X1)/m;
Set up a circulation, B (0)~B (m) initial value at different levels is recorded to B (r) .Xmin one by one by B (r) .Xmin:=X1+r* TX at B (r).
FOR?r:=0?TO?m?DO?B(r).Xmin:=X1+r*TX
Hierarchy mapping is asked " level interior element number " at different levels:
Set up a circulation, by " hierarchy mapping " A (j) is carried out classification and calculate, see which rank A (j) belongs at A (1)~A (n).Behind traversal A (1)~A (n), can count B (0) .Nb1~B (m) .Nb1.
FORj:=1?TO?n?DO
[uniform hierarchy mapping: r:=trunc (m* (A (j)-X1)/(Xn-X1))=trunc ((A (j)-X1)/TX)
B(r).Nb1:=B(r).Nb1+1
Ask " element number before the level " at different levels:
B(0).Nb2:=0
FOR?r:=1?TO?m?DO?B(r).Nb2:=B(r-1).Nb2+B(r-1).Nb1;
5. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: calculate " level probability density ", " r level probability slope " and " numerical digit to be looked into is put probability density " by " level interior element number ";
Calculate probability density B (r) .px at each rank (at B (r) .Xmin place):
FOR?r:=1?TO?m-1?DO?B(r).px:=(B(r-1).Nb1+B(r).Nb1)/(2*TX);
B(0).px:=B(1).px-B(0).G*TX;
Calculate each level other probability slope B (r) .G:
FOR?r:=1?TO?m-1?DO?B(r).G:=(B(r).Nb1-B(r-1).Nb1)/(TX*TX);
B (0) .G:=B (1) .G; { both should approximately equal }
6. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: by hierarchy mapping r:=trunc (XY-X1)/TX)
Calculate the affiliated rank of several XY to be found, registration obtains " element number before the level " of this grade according to attribute.
7. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: the probability calculation sub, by utilizing " level probability density ", " the r level probability slope " and " to be looked into numerical digit put probability density " of number to be found at affiliated rank r, carrying out area calculates, obtain folded probability distribution area between " number to be looked into " and " initial value ", Here it is " thick premeasuring in the level ", and " thick premeasuring in the level " obtained " thick predicting unit " with " element number before the level " addition.
The probability calculation sub, probability density pxy
pxy:=B(r).px+B(r).G*(XY-B(r).Xmin);
The element number (being area) of B (r) .Xmin~XY is Sx, claims " element number probability ".
Sx:=(B(r).px+pxy)*(XY-B(r).Xmin)/2;
The thick predicting unit in the place of XY is K, is " element number before the level "+" element number probability ".
K:=B(r).Nb2+round(Sx);
Error is dX:=XY-A (K);
8. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: after the error amount that calculates " thick predicting unit " and " number to be looked into ", obtain error amount pairing " correction probable value " by probability calculation, after with " correction probable value " " level in thick premeasuring " being revised, obtain " smart premeasuring in the level ", obtain " smart predicting unit " with " element number before the level " addition.
The error correction sub.
PXdX:=pxy;
pxx:=pxy
WHILE?ABS(dX)>KK*(1/PXdX)THEN
【pxx:=PXdX;
PXdX:=PXdX+G*dX;
dS:=(pxx+PXdX)*dX/2
dK:=round(dS);
K:=K+dK;
dX:=XY-A(K); 】
9. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: after obtaining " smart predicting unit ", cooperate tradition to search algorithm again, determine " number to be looked into " position in table, or confirm that " number to be looked into " do not exist in table;
Determine the target phase
IF dX=O THEN means and has found XY, at K unit.
ELSE
[IF ABS (dX)<3* (1/PXdX) THEN[sequential search XY]
ELSE [binary search XY]
10. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: one of its special case is the algorithm that order of classification is searched, when ordering, just adopt hierarchical algorithms, classification positioning sorting to comprise order of classification and two stages of positioning sorting;
The order of classification stage, existed an array X (j) (j=0,1 ..., n) wait for ordering, wherein max and min are respectively maximal value and minimum value.For this reason, construct an array linked list C (r) as the stepping array, stepping number (r=0,1,, m), C (r) comprises that a pointer field C (r) ^.link is used for linking the level interior element number that the unit, an integer field C (r) ^.Nb1 that belong to the r level among the X (j) are used for writing down the r level.(m* (x (j)-min)/(max-min)) carries out classification, and is linked among the corresponding rank with hierarchy mapping r:=trunc to X (j) one by one.After traveling through once to X (j), C (r) ^.link has linked each unit that belongs to the r level among the X (j), and C (r) ^.Nb1 has write down the level interior element number of r level;
In the positioning sorting stage, comprise two work: one is step by step the data of C (r) ^.link link to be carried out grade internal sort and collect among array D (0)~D (n), becomes ordered list; Another is to set up one to set up the additional space that record array E (0)~E (m) handles as classification, (r=0~m) is the unit of r level, and it comprises 2 data fields: definition E (r) .Nb1 is " opening the beginning position " (integer type) in D (0)~D (n) of r level to E (r); Definition E (r) .Nb2 is " end position " in D (0)~D (n) of r level.
11. hierarchical statistics-probability calculation formula seeking algorithm according to claim 1, its further feature is: on the basis of order of classification or hierarchical statistics, carry out the classification location and search, comprise that classification is searched and two stages are searched in the location;
The stage is searched in classification.To several XY to be found, earlier with hierarchy mapping r:=trunc (m*XY-min)/(max-min)) carry out classification calculating, calculate the rank r that it belongs to, known " open beginning position " and " end position " of XY in array then.
Search the stage location, to the data of r level in the array (promptly " open the position of beginning " and " end position " between data) carry out tradition and efficiently search (comprise binary search, Fibonacci are searched, interpolation search).
CNA2006100321351A 2006-08-25 2006-08-25 Hierarchical statistics-probability calculation formula seeking algorithm Pending CN101131692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2006100321351A CN101131692A (en) 2006-08-25 2006-08-25 Hierarchical statistics-probability calculation formula seeking algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2006100321351A CN101131692A (en) 2006-08-25 2006-08-25 Hierarchical statistics-probability calculation formula seeking algorithm

Publications (1)

Publication Number Publication Date
CN101131692A true CN101131692A (en) 2008-02-27

Family

ID=39128962

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2006100321351A Pending CN101131692A (en) 2006-08-25 2006-08-25 Hierarchical statistics-probability calculation formula seeking algorithm

Country Status (1)

Country Link
CN (1) CN101131692A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117566A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited Ranking method and system
CN104714967A (en) * 2013-12-14 2015-06-17 中国航空工业集团公司第六三一研究所 Two-dimensional interpolation method based on dimensionality reduction
CN105808654A (en) * 2016-02-29 2016-07-27 湖南蚁坊软件有限公司 Stream data-oriented two-level sorting method
CN106597243A (en) * 2017-02-14 2017-04-26 吴笃贵 Probability characteristic parameter extraction method based on partial discharge holographic data
CN111078683A (en) * 2019-11-02 2020-04-28 国网辽宁省电力有限公司经济技术研究院 Interpolation search-based power grid ledger data filling and counting method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117566A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited Ranking method and system
CN104714967A (en) * 2013-12-14 2015-06-17 中国航空工业集团公司第六三一研究所 Two-dimensional interpolation method based on dimensionality reduction
CN104714967B (en) * 2013-12-14 2017-10-24 中国航空工业集团公司第六三一研究所 A kind of method for determining automobile fan control mode
CN105808654A (en) * 2016-02-29 2016-07-27 湖南蚁坊软件有限公司 Stream data-oriented two-level sorting method
CN106597243A (en) * 2017-02-14 2017-04-26 吴笃贵 Probability characteristic parameter extraction method based on partial discharge holographic data
CN106597243B (en) * 2017-02-14 2018-12-07 吴笃贵 A kind of probability characteristics parameter extracting method based on shelf depreciation holographic data
CN111078683A (en) * 2019-11-02 2020-04-28 国网辽宁省电力有限公司经济技术研究院 Interpolation search-based power grid ledger data filling and counting method and device

Similar Documents

Publication Publication Date Title
CN107357846B (en) The methods of exhibiting and device of relation map
CN102915347B (en) A kind of distributed traffic clustering method and system
US6148295A (en) Method for computing near neighbors of a query point in a database
CN103631911B (en) OLAP query processing method based on storage of array and Vector Processing
CN104598647B (en) A kind of tree graph search and the method for matching article
CN101131692A (en) Hierarchical statistics-probability calculation formula seeking algorithm
CN105657064B (en) Swift load-balancing method based on dummy node storage optimization
CN101681368A (en) Aggregation query processing
JP2002529819A (en) Method and apparatus for occupying sparse matrix entries with corresponding data
CN110069500B (en) Dynamic mixed indexing method for non-relational database
CN106528815A (en) Method and system for probabilistic aggregation query of road network moving objects
CN106844534A (en) Towards the GeoHash coding methods by geographical spatial data one-dimensional of NoSQL databases
US11036709B2 (en) Single-level, multi-dimension, hash-based table partitioning
KR100828404B1 (en) Processing method of data stream using Border Monitoring Query
CN106844666B (en) Self-adaptive time series data query method
CN103914456A (en) Data storage method and system
CN102298650A (en) Distributed recommendation method of massive digital information
CN107506490A (en) Preferential search algorithm and system based on position top k keyword queries under sliding window
CN106874384A (en) A kind of isomery address standard handovers and matching process
CN100530192C (en) Text searching method and device
Kobza et al. Divergence measures on hesitant fuzzy sets
CN110134683A (en) The partition zone optimizing research method and system that magnanimity element stores in relational database
CN109635069A (en) A kind of geographical spatial data self-organizing method based on comentropy
CN107894997B (en) Industrial time sequence data query processing method and system
US20090150393A1 (en) Method for assignment of point level address geocodes to street networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20080227