一种基于距离的求解二维空间中代表性节点集的算法An Algorithm for Solving Representative Node Sets in Two-Dimensional Space Based on Distance
技术领域Technical field
本发明涉及一种新的基于距离的求解二维空间中代表性节点集的算法(New Distance-based Representative Skyline,简称NDRS)。The invention relates to a new distance-based algorithm for solving a representative node set in a two-dimensional space (New Distance-based Representative Skyline, NDRS for short).
背景技术Background technique
Skyline查询问题是由Borzsonyi在2001提出来的。近年来,Skyline的查询处理吸引了大批数据库研究者的注意力,迅速成为了数据库领域的一个研究热点,大量的Skyline查询处理算法孕育而生。已有的成果主要覆盖三个方面:The Skyline query question was proposed by Borzsonyi in 2001. In recent years, Skyline's query processing has attracted the attention of a large number of database researchers, and has quickly become a research hotspot in the database field. A large number of Skyline query processing algorithms have emerged. The existing results mainly cover three aspects:
(1)集中式数据库上的Skyline查询处理:包括BNL算法(Block Nested Loop)、SFS算法(Sort Filter Skyline)、分治算法(Divide and Conquer)、位图算法(Bitmap)、索引算法(Index)[5]、NN算法(Nearest Neighbor)、BBS(Branch and Bound Skyline)[3]等。(1) Skyline query processing on centralized database: including BNL algorithm (Block Nested Loop), SFS algorithm (Sort Filter Skyline), Divide and Conquer algorithm, Bitmap algorithm, Index algorithm (Index) [5], NN algorithm (Nearest Neighbor), BBS (Branch and Bound Skyline) [3] and so on.
(2)分布式数据库上的Skyline查询处理:包括针对普通分布式环境下的Skyline计算、特殊分布式环境下的Skyline计算、对等网络上的Skyline计算等。(2) Skyline query processing on distributed database: including Skyline calculation in common distributed environment, Skyline calculation in special distributed environment, Skyline calculation on peer-to-peer network, etc.
(3)其它计算模型下的Skyline查询处理:包括Skyline结果集大小及计算开销估算、在任意子空间上的Skyline查询处理、在所有子空间上的Skyline查询处理以及在数据流上的Skyline查询处理等。
(3) Skyline query processing under other calculation models: including Skyline result set size and calculation cost estimation, Skyline query processing on any subspace, Skyline query processing on all subspaces, and Skyline query processing on the data stream. Wait.
虽然传统定义下的Skyline算法已经相对成熟,但依然有很多方面需要仍需改进。最为突出的一点是在很多应用场景下经由Skyline查询反馈回来的Skyline点的数目过大,以至于其无法很好地服务于多目标决策。Although the Skyline algorithm under the traditional definition is relatively mature, there are still many aspects that still need improvement. The most striking thing is that the number of Skyline points returned via Skyline query in many application scenarios is too large to serve multi-target decisions.
为了解决这个问题,林学民等人于2007年首次提出了“代表性Skyline(Top-k representative Skyline,简称TRS)”概念,即从Skyline点集中选出k个Skyline点作为“代表性Skyline点”,而选出的k个节点需要满足以下条件:选出的k个Skyline点所支配(dominate)的数据集中的点的个数要大于或等于从Skyline点集中任意选取k个Skyline点所能支配(dominate)的数据集中点的个数。其中支配(dominate)的含义如下:给定两个d维坐标的点x=(x1,x2,…,xd)和y=(y1,y2,…,yd),如果对于任意的整数i∈[1,d],都有xi≤yi,且存在一个i使得xi<yi,则称点x支配点y。类似地,陶宇飞等人于2009年提出了“基于距离的代表性Skyline(Distance-based Representative Skyline)”的策略,其定义为:选出的k个点满足 Er(κ,S)代表Skyline点集中所有非代表性skyline点到离其最近的代表性skyline点所能取得的最大距离值,然后通过动态规划算法求解得到最终的代表性Skyline点集,称为DRS算法。其中,κ表示“代表性Skyline”的点集,S指的是数据集中所有节点集合。本专利将主要采用基于距离的代表性Skyline的定义。
In order to solve this problem, Lin Xuemin and others first proposed the concept of “Top-k representative Skyline (TRS)” in 2007, that is, select k Skyline points from Skyline points as “representative Skyline points”. And the selected k nodes need to satisfy the following conditions: the number of points in the data set dominated by the selected k Skyline points is greater than or equal to the arbitrarily selected k Skyline points from the Skyline point set. (dominate) the number of points in the data set. The meaning of the dominant is as follows: the points x=(x1,x2,...,xd) and y=(y1,y2,...,yd) given two d-dimensional coordinates, if for any integer i∈[ 1, d], all have xi ≤ yi, and there is an i such that xi < yi, then point x dominates point y. Similarly, Tao Yufei et al. proposed the “Distance-based Representative Skyline” strategy in 2009, which is defined as: the selected k points are satisfied. Er(κ,S) represents the maximum distance value that can be obtained from all non-representative skyline points in the Skyline point to the nearest representative skyline point, and then solved by the dynamic programming algorithm to obtain the final representative Skyline point set, called DRS algorithm. Where κ represents the set of points of “Representative Skyline” and S refers to the set of all nodes in the data set. This patent will primarily use the definition of representative Skyline based on distance.
以上算法中,TRS算法有一个明显的缺点,即选出的“代表性Skyline点”有一定的偏向性,往往偏向于数据集中点相对密集的地方,同时它的时间复杂度(O(km2+nlog(m)))太高,其中的n表示数据集的点的个数,m表示Skyline点的个数。因此,在数据规模很大以及数据分布集中的场景下TRS的有效性将很难保证。基于DRS算法虽然很好地解决了TRS不能很好地代表整个Skyline点集的缺陷,但时间复杂度仍然过高(O(|D|.log(m)+m2(k-2))),其中|D|代表数据集的大小,这严重制约了算法的适用范围。Among the above algorithms, the TRS algorithm has an obvious disadvantage, that is, the selected "representative Skyline point" has a certain degree of bias, and tends to be relatively dense in the data concentration point, and its time complexity (O(km 2 +nlog(m))) is too high, where n represents the number of points in the data set and m represents the number of Skyline points. Therefore, the effectiveness of TRS will be difficult to guarantee in the context of large data scales and concentrated data distribution. Although the DRS algorithm solves the defect that TRS does not represent the entire Skyline point set well, the time complexity is still too high (O(|D|.log(m)+m 2 (k-2))) , where |D| represents the size of the data set, which severely limits the scope of the algorithm.
发明内容Summary of the invention
本发明提供了一种新的基于距离的求解二维空间中代表性Skyline节点集的算法,算法时间复杂度为O(k2log3m),远低于DRS算法的时间复杂度。The invention provides a new distance-based algorithm for solving representative Skyline node sets in two-dimensional space, and the time complexity of the algorithm is O(k 2 log 3 m), which is much lower than the time complexity of the DRS algorithm.
本发明通过以下技术手段实现:The invention is achieved by the following technical means:
一种基于距离的求解二维空间中代表性节点集的算法,包含以下步骤:An algorithm for solving a representative set of nodes in a two-dimensional space based on distance, comprising the following steps:
S1,输入数据集,用BNL算法计算数据集中的Skyline点集Q;对点集Q排序后求出初始点到其它任意Skyline点的曼哈顿距离值并存储;S1, input data set, calculate the Skyline point set Q in the data set by using the BNL algorithm; sort the point set Q to find the Manhattan distance value from the initial point to any other Skyline point and store it;
S2,求出Skyline点集中的k个代表性Skyline点;首先引入Testing(B)算法判断是否存在k个半径不大于B的圆环所能覆盖的区域能够包含整个Skyline点集,如果是,返回正确,否则返回错误;然后设定共k个下标,其中S0=1令1≤i≤k-1;满足
Testing(radius(Si-1,Si))为错误且Testing(radius(Si-1,Si+1))为正确,则令Bopt=Er(κ,S),其中,κ代表“代表性Skyline点”集合,是输入值,S代表数据集中所有节点集合,||p,p′||表示点p与p′之间的欧式距离;S2, find k representative Skyline points in the Skyline point set; first introduce the Testing (B) algorithm to determine whether there are k areas with a radius not greater than B that can cover the entire Skyline point set, and if so, return Correct, otherwise return an error; then set A total of k subscripts, where S 0 =1 is 1 ≤ i ≤ k-1; Testing(radius(S i-1 , S i )) is satisfied and Testing(radius(S i-1 , S i +1) )) is correct, then Let B opt =Er(κ,S), where κ represents the set of "representative Skyline points", which is the input value, S represents the set of all nodes in the data set, and ||p,p'|| represents the points p and p' Euclidean distance
S3,将Bopt带入Testing(Bopt)中,返回k个代表性Skyline点。S3, Bring B opt into Testing (B opt ), and returning k representative Skyline points.
其中,在所述的数据集中求区域[i,j]覆盖的多个Skyline点{pi,....pj}的中心点center(i,j)以及覆盖圆以中心点为圆心的最小半径radius(i,j),1≤i≤j≤m,其中m为Skyline点集的元素个数;定义centre(i,j)=pu,满足
Wherein, in the data set, the center point center(i, j) of the plurality of Skyline points {p i , . . . p j } covered by the region [i, j] and the circle of the circle centered on the center point are obtained. The minimum radius radius(i,j), 1≤i≤j≤m, where m is the number of elements of the Skyline point set; defining centre(i,j)=p u , satisfied
在二维空间中,NDRS算法所求解的代表性Skyline能够准确地代表Skyline点集的整体情况,通过反馈k个“代表性Skyline”点给决策者,再对决策者选定的“代表性Skyline点”对应区间展开,对区间内的Skyline点进行二次决策,完美解决了Skyline点集过大不利于决策者决策的难题,使Skyline技术更好地服务于多目标决策。在Skyline点集已知的情况下,相比于DRS算法,本发明的算法时间复杂度为O(k2log3m),远低于DRS算法的时间复杂度。In the two-dimensional space, the representative Skyline solved by the NDRS algorithm can accurately represent the overall situation of the Skyline point set, by feeding back k "representative Skyline" points to the decision maker, and then selecting the representative Skyline for the decision maker. The point "expansion interval" is developed, and the second decision is made to the Skyline point in the interval, which perfectly solves the problem that the Skyline point set is too large to be detrimental to the decision maker's decision making, and makes the Skyline technology better serve the multi-objective decision. In the case where the Skyline point set is known, the time complexity of the algorithm of the present invention is O(k 2 log 3 m) compared to the DRS algorithm, which is much lower than the time complexity of the DRS algorithm.
附图说明DRAWINGS
图1为本发明算法过程示意图;1 is a schematic diagram of an algorithm process of the present invention;
图2为点集示意图;Figure 2 is a schematic diagram of a point set;
图3一个区域[i,j]所覆盖的多个Skyline点示意图
Figure 3 Schematic diagram of multiple Skyline points covered by a region [i, j]
图4为区间[3,6]及区间[5,6]的覆盖圆环Figure 4 shows the coverage ring of interval [3,6] and interval [5,6]
图5为区间[1,5]的radius(i,j)随着center(i,j)的不同取值的改变情况。Figure 5 shows the change of the radius(i,j) of the interval [1,5] with the value of center(i,j).
具体实施方式Detailed ways
以下将结合附图对本发明的实施过程进行详细说明。The implementation process of the present invention will be described in detail below with reference to the accompanying drawings.
一种基于距离的求解二维空间中代表性节点集的算法,采用二分参数搜索技术,引入一个Testing函数,用于测试距离的可行性,并重新设计了一个新的计算半径的算法,使得程序执行时间大幅度减少。如图1所示,具体过程如下:A distance-based algorithm for solving representative node sets in two-dimensional space, using a binary parameter search technique, introducing a Testing function to test the feasibility of distance, and redesigning a new algorithm for calculating radius, making the program The execution time is greatly reduced. As shown in Figure 1, the specific process is as follows:
一、预处理First, pretreatment
(1)计算数据集中对应的Skyline点集(1) Calculate the corresponding Skyline point set in the data set
在给定的数据集中,采用BNL算法,求出能够覆盖整个数据集的Skyline点集。In the given data set, the BNL algorithm is used to find the Skyline point set that can cover the entire data set.
该算法的大体框架如下The general framework of the algorithm is as follows
输入:数据集G=(x,y)Input: data set G=(x,y)
输出:Skyline点集Output: Skyline Point Set
1.1对数据集中的所有点按照x值从大到小进行排序;1.1 Sort all points in the data set according to the value of x from large to small;
1.2将排序好的数据集的第一个节点加入到Skyline点集中;
1.2 Add the first node of the sorted data set to the Skyline point set;
2按顺序对比当前点的y值和Skyline点集中y值最大的Skyline点对比,如果大于那个最大的Skyline点,则将当前点加入到Skyline点集中去,否则不加。循环此项操作,直至所有点都已搜索完毕。2 Contrast the y value of the current point with the Skyline point with the largest y value in the Skyline point set. If it is greater than the largest Skyline point, add the current point to the Skyline point set, otherwise it will not be added. Loop through this operation until all points have been searched.
3对Skyline点集排序并输出。3 Sort and output the Skyline point set.
该预处理算法可以在O(nlogn)的时间复杂度内完成。This preprocessing algorithm can be done within the time complexity of O(nlogn).
(2)求被一个区域[i,j]所覆盖的多个Skyline点{pi,....pj}的中心点center(i,j)以及覆盖圆以中心点为圆心的最小半径radius(i,j),其中m为Skyline点集的元素个数,1≤i≤j≤m。定义centre(i,j)=pu,满足
(2) Find the center point center(i,j) of the plurality of Skyline points {p i ,....p j } covered by one region [i,j] and the minimum radius of the circle covering the center point as the center Radius(i,j), where m is the number of elements of the Skyline point set, 1 ≤ i ≤ j ≤ m. Define central(i,j)=p u to satisfy
算法大体框架如下:The general framework of the algorithm is as follows:
输入:下标i,j;Input: subscript i, j;
输出:center(i,j)、radius(i,j);Output: center(i,j), radius(i,j);
1.1如果i=j,center(i,j)=i,radius(i,j)=0;1.1 if i = j, center (i, j) = i, radius (i, j) = 0;
1.2否则,令left=i,right=j;当left<right.1.2 Otherwise, let left=i,right=j; when left<right.
1.2.1令1.2.1 Order
mid1=(left+right)/2,FR=||pj,pmid1||,FL=||pmid1,pi||;Mid1=(left+right)/2,FR=||p j ,p mid1 ||,FL=||p mid1 ,p i ||;
1.2.2如果FR<FL,right=mid1;
1.2.2 if FR < FL, right = mid1;
1.2.3否则1.2.3 otherwise
1.2.3.1如果FR>FL,left+=mid11.2.3.1 If FR>FL, left+=mid1
1.2.3.2否则center(i,j)=mid1,radius(i,j)=||pj,pmid1||;1.2.3.2 Otherwise center(i,j)=mid1,radiis(i,j)=||p j ,p mid1 ||;
1.3Mid1=right;1.3Mid1=right;
1.4.1mid1=mid1-1;1.4.1mid1=mid1-1;
1.4.2否则mid1不变1.4.2 otherwise mid1 is unchanged
1.5返回center(i,j)=mid1,radius(i,j)=||pj,pmid1||;1.5 returns center(i,j)=mid1,radiis(i,j)=||p j ,p mid1 ||;
求被区域[i,j]所覆盖的多个Skyline点{pi,....pj}的中心点以radius(i,j)可以在O(log(m))的时间复杂度内完成。Find the center point of multiple Skyline points {p i ,....p j } covered by the region [i,j] Radius(i,j) can be done within the time complexity of O(log(m)).
二、由已知Skyline点集求解k个代表性Skyline点。Second, the k representative Skyline points are solved by the known Skyline point set.
(1)引入Testing(B)算法判断是否存在k个当1≤i≤j≤m,radius(i,j)<=B时由半径为radius(i,j)的圆环所能覆盖的区域能够包含整个Skyline点集,如果是,返回正确,否则返回错误。(1) Introduce the Testing (B) algorithm to determine whether there are k areas that can be covered by a ring of radius (i, j) when 1 ≤ i ≤ j ≤ m and radius(i, j) <= B Can contain the entire Skyline point set, and if so, return correctly, otherwise an error is returned.
算法大体框架如下:The general framework of the algorithm is as follows:
输入:B,排好序的Skyline点集(包含m个元素)、下标Si其
中1≤i≤m。Input: B, the sorted Skyline point set (containing m elements), and the subscript S i where 1 ≤ i ≤ m.
输出:正确or错误Output: correct or wrong
1.1其中1≤i≤m初始化p=0;1.1 Where 1 ≤ i ≤ m initialization p = 0;
1.2当p<k-11.2 when p<k-1
1.2.1二分搜索BiNSRCH(Si+1,j,Si+1)直至max(radius(Si+1,Si+1))<=B。同时在此过程中存储centre(Si+1,Si+1)做为B下的代表性Skyline,令p+=1;1.2.1 Bivariate search BiNSRCH(S i +1,j,S i+1 ) up to max(radius(S i +1,S i+1 ))<=B. At the same time, store central(S i +1,S i+1 ) as the representative Skyline under B, so that p+=1;
1.3如果radius(Sk-1+1,Sk)<=B,返回正确,否则返回错误.1.3 If radius(S k-1 +1,S k )<=B, the return is correct, otherwise an error is returned.
(2)为了求解满足 的Er(κ,S),先设定共k个下标,其中S0=1令1≤i≤k-1满足Testing(radius(Si-1,Si))为错误且Testing(radius(Si-1,Si+1))为正确,则然后以此返回对应的k个代表性Skyline点。(2) in order to solve the satisfaction Er(κ, S), set first A total of k subscripts, where S 0 =1 makes 1 ≤ i ≤ k-1 satisfying Testing(radius(S i-1 , S i )) as an error and Testing(radius(S i-1 ,S i +1) ) is correct, then Then return the corresponding k representative Skyline points.
该算法应用了Distance-Testing策略,即RSP算法,算法大体框架如下:The algorithm applies the Distance-Testing strategy, which is the RSP algorithm. The general framework of the algorithm is as follows:
输入:排好序的Skyline点集(包含m个元素)、center(i,j),radius(i,j)当1≤i≤j≤mInput: sorted Skyline point set (including m elements), center(i,j), radius(i,j) when 1≤i≤j≤m
输出:k个代表性Skyline点,Bopt。Output: k representative Skyline points, B opt .
1.1设{S0,S1,....Sk-1},其中其中S0=1
1.1 Let {S 0 , S 1 ,....S k-1 }, where S 0 =1
1.2for b<-1 to P-1执行1.2for b<-1 to P-1 execution
1.2.1ilow<-Sb-1;ihigh<--m;1.2.1ilow<-S b-1 ;ihigh<--m;
1.2.2当ilow<ihigh执行1.2.2 when ilow<ihigh is executed
imid<-(ilow+ihigh)/2;Imid<-(ilow+ihigh)/2;
1.2.3B<-radius(Si-1,Si+1)1.2.3B<-radius(S i-1 ,S i +1)
1.2.4如果Testing(B) then1.2.4 If Testing(B) then
Ihigh<-imid;Ihigh<-imid;
1.2.5否则1.2.5 otherwise
ilow<-imid+1;Ilow<-imid+1;
1.2.6Sk-1<-ihigh;1.2.6S k-1 <-ihigh;
1.2.7Bk-1<-radius(Sk-1+1,Sk-2)1.2.7B k-1 <-radius(S k-1 +1,S k-2 )
BP<-radius(Sp-1,m)B P <-radius(S p-1 ,m)
1.3Return k个代表性Skyline点;1.3Return k representative Skyline points;
NDRS算法主要应用了Distance-Testing技术,它主要是在Skyline点集已知的情况下求解代表性Skyline点(简称“RSP”)的方案,并称之为DisBase算法。The NDRS algorithm mainly uses the Distance-Testing technique, which is mainly used to solve the representative Skyline point ("RSP") in the case where the Skyline point set is known, and is called the DisBase algorithm.
DisBase算法主要思路是先从排好序的m个Skyline点对应下标中选S0=0,共k个下标,这些下标满足如下条件:当1≤i≤k-1时有radius(1,S1)<Bopt<raidus(1,S1+1),且radius(Si+1,Si+1)<Bopt<raidus(Si+1,Si+1+1)
而最终的 这部分的时间复杂度为O(klogm.logm),然后由Testing(Bopt)返回最终的k个代表性Skyline点(即RSP),这部分的时间复杂度为O(k.logm),故算法的时间复杂度为O(k2log3m)。The main idea of the DisBase algorithm is to select from the corresponding subscripts of m rows of Skyline points. S 0 =0, a total of k subscripts, these subscripts satisfy the following conditions: when 1≤i≤k-1, there is radius(1,S 1 )<B opt <raidus(1,S 1 +1), and Radius(S i +1,S i+1 )<B opt <raidus(S i +1,S i+1 +1) and the final The time complexity of this part is O(klogm.logm), and then the final k representative Skyline points (ie RSP) are returned by Testing(B opt ). The time complexity of this part is O(k.logm), so The time complexity of the algorithm is O(k 2 log 3 m).
以下是NDRS算法的具体思路及其正确性证明。The following is the specific idea of the NDRS algorithm and its correctness proof.
定理1:当B≥Bopt时,Testing(B)为正确.Theorem 1: When B ≥ B opt , Testing (B) is correct.
证明:假设当B≥Bopt时,Testing(B)为错误,说明不存在任何由k个半径不大于B的圆能够完全覆盖整个Skyline点集,而B≥Bopt,那么也不可能存在任何由k个半径不大于Bopt的圆能够完全覆盖整个Skyline点集,即Testing(Bopt)为错误,与事实不符,由此可证定理1成立。Proof: Assume that when B ≥ B opt , Testing (B) is an error, indicating that there is no circle with k radii not greater than B that can completely cover the entire Skyline point set, and B ≥ B opt , then there is no possibility of any A circle with k radii not greater than B opt can completely cover the entire Skyline point set, that is, Testing (B opt ) is an error, which is inconsistent with the fact, thereby confirming that theorem 1 holds.
定理2:令Bopt为k个半径不大于Bopt的圆能完全覆盖Skyline点集的最小取值,则
Theorem 2: Let B opt be a circle whose k radius is not greater than B opt can completely cover the minimum value of the Skyline point set, then
证明:由{S0,S1,....Sk-1}本身定义可知,对于任意1≤i≤k-1,都满足Testing(radius(Si-1+1,Si+1))为正确.那我们只需证明Bopt存在于之中即可,即只需证明对第1个圆C1所覆盖的第一个Skyline点的下标为1,除1以外的第i个圆Ci所覆盖的第一个Skyline点的下标为Si-1+1。因为radius(1,S1)<Bopt,所以对于{2,...S1}不可能为Bopt所对应圆覆盖的第一个Skyline点的下标,同理可证,对任意{Si+2,Si+1}不可能为Bopt所对应圆覆盖的第一个Skyline点的下标,由
此,定理2得证。Proof: It is known from the definition of {S 0 , S 1 , ....S k-1 } that for any 1≤i≤k-1, Testing(radius(S i-1 +1,S i +1) is satisfied. )) is correct. Then we only need to prove that B opt exists in In the middle, it is only necessary to prove that the index of the first Skyline point covered by the first circle C 1 is 1 , and the first Skyline point covered by the i-th circle C i except 1 Marked as S i-1 +1. Since radius(1,S 1 )<B opt , it is impossible for {2,...S 1 } to be the subscript of the first Skyline point covered by the circle corresponding to B opt , and the same is true for any { S i +2, S i+1 } cannot be the subscript of the first Skyline point covered by the circle corresponding to B opt , and thus theorem 2 is proved.
在现有技术中,求解Skyline的技术已经相对完善。在预处理生成Skyline点集的基础上,本发明通过NDRS算法进一步搜索出k个代表性Skyline点,大大降低了决策者的决策范围。In the prior art, the technique for solving Skyline has been relatively perfect. Based on the pre-processed Skyline point set, the present invention further searches for k representative Skyline points by the NDRS algorithm, which greatly reduces the decision-making range of the decision maker.
现在以图2中的点集为例,进一步说明算法的实施过程。Now take the point set in Figure 2 as an example to further illustrate the implementation process of the algorithm.
目的:求出k=3个代表性Skyline点。Purpose: Find k = 3 representative Skyline points.
一、预处理First, pretreatment
1.1计算数据集中对应的Skyline点集1.1 Calculate the corresponding Skyline point set in the data set
在给定的数据集{p1,p2,...,p12}中,采用BNL算法,求出能够覆盖整个数据集的Skyline点集{p1,p2...,p6}。In the given data set {p 1 , p 2 ,..., p 12 }, the BNL algorithm is used to find the Skyline point set {p 1 , p 2 ..., p 6 } that can cover the entire data set. .
1.2求被一个如图3所示区域[i,j]所覆盖的多个Skyline点{pi,....pj}的中心点center(i,j)以及radius(i,j)可用如下方法,这里是在线算法,非离线算法。1.2 Find the center point center(i,j) and radius(i,j) of multiple Skyline points {p i ,....p j } covered by a region [i,j] as shown in Fig. 3 The following method, here is the online algorithm, non-offline algorithm.
求被区域[i,j]所覆盖的多个Skyline点{pi,....pj}的中心点以radius(i,j)可以在O(log(m))的时间复杂度内完成。Find the center point of multiple Skyline points {p i ,....p j } covered by the region [i,j] Radius(i,j) can be done within the time complexity of O(log(m)).
图4.为区间[3,6]及区间[5,6]的覆盖圆环,图5为区间[1,5]的radius(i,j)随着center(i,j)的不同取值的改变情况。易知可以通过比较FR、FL的大小经由二分法很快可以求出radius(1,5)=10,center(1,5)=3。Figure 4. Covering rings for interval [3,6] and interval [5,6]. Figure 5 shows the value of radius(i,j) for interval [1,5] with center(i,j). Change situation. It is easy to know that radius(1,5)=10, center(1,5)=3 can be quickly found by dichotomy by comparing the sizes of FR and FL.
二、由已知Skyline点集{p1,p2...,p6}求解3个代表性Skyline点。
Second, three representative Skyline points are solved by the known Skyline point set {p 1 , p 2 ..., p 6 }.
如图5所示,经过预处理阶段已经求出了Skyline点集{p1,p2...,p6}并可在O(1)的时间内求出任意两个Skyline点之间的距离。(这里||p1,p2||=2,||p2,p3||=4,||p3,p4||=6,||p4,p5||=4,||p5,p6||=2),设有下标{S0,...Si},0<i<k,其中S0=1,这里,k=3.则需要求出S1,S2的值,S1,S2满足Testing(radius(Si-1,Si))为错误且Testing(radius(Si-1,Si+1))为正确.As shown in Fig. 5, the Skyline point set {p 1 , p 2 ..., p 6 } has been obtained through the preprocessing stage and can be found between any two Skyline points in the time of O(1). distance. (here ||p 1 ,p 2 ||=2,||p 2 ,p 3 ||=4,||p 3 ,p 4 ||=6,||p 4 ,p 5 ||=4, ||p 5 , p 6 ||=2), with subscript {S 0 ,...S i }, 0<i<k, where S 0 =1, where k=3. value S 1, S 2 a, S 1, S 2 satisfy Testing (radius (S i-1 , S i)) and the error Testing (radius (S i-1 , S i +1)) is correct.
具体求解步骤如下:The specific solution steps are as follows:
1)用二分法求S1,S2,设ilow=1,ihigh=6;imid=(ilow+ihigh)/2=3;1) Find S 1 , S 2 by dichotomy, set ilow=1, ihigh=6; imid=(ilow+ihigh)/2=3;
2)判断Testing(radius(1,3)),调用1.2中方法求得radi(u1s,3),=4令B=4,Testing(B)可由二分法求解:2) Determine Testing(radius(1,3)), call the method in 1.2 to find radi(u1s,3),=4 to make B=4, and Testing(B) can be solved by dichotomy:
2.1)参考Testing算法伪代码,步骤如下:2.1) Refer to the testing algorithm pseudo code, the steps are as follows:
1.设ilow2=1,ihigh2=6;imid2=(ilow2+ihigh2)/2=3;1. Set ilow2=1, ihigh2=6; imid2=(ilow2+ihigh2)/2=3;
2.radius(1,3)=B,则ilow2=4,ihigh2=6;imid2=5;2.radius(1,3)=B, then ilow2=4, ihigh2=6; imid2=5;
3.radius(4,5)>B,此时ilow2=4,ihigh2=imid2-1=4;3.radius(4,5)>B, ilow2=4, ihigh2=imid2-1=4;
4.radius(5,6)=2<B,返回正确.4.radius(5,6)=2<B, returning correctly.
3)ilow=1,ihigh=imid=4;3) ilow=1, ihigh=imid=4;
3.1)imid=(ilow+ihigh)/2=2;
3.1) imid=(ilow+ihigh)/2=2;
Testing(imid)=Testing(2),同2.1)方法可得Testing(2)为错误,由此可得S1=2.Testing (imid) = Testing (2), the same as 2.1) method can get Testing (2) as an error, thus obtaining S 1 = 2.
1)ilow=3,ihigh=6,同理可得S2=3.1) ilow=3, ihigh=6, and similarly, S 2 =3.
2)由Testing(Bopt)可返回对应center(i,j)作为代表性skyline点,求得代表性Skyline点为{p2,p4,p5}。
2) From Testing (B opt ), the corresponding center(i, j) can be returned as a representative skyline point, and the representative Skyline points are obtained as {p 2 , p 4 , p 5 }.