CN113448994A - Continuous regrettage minimization query method based on core set - Google Patents

Continuous regrettage minimization query method based on core set Download PDF

Info

Publication number
CN113448994A
CN113448994A CN202110770688.1A CN202110770688A CN113448994A CN 113448994 A CN113448994 A CN 113448994A CN 202110770688 A CN202110770688 A CN 202110770688A CN 113448994 A CN113448994 A CN 113448994A
Authority
CN
China
Prior art keywords
points
tuple
core set
core
minimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110770688.1A
Other languages
Chinese (zh)
Other versions
CN113448994B (en
Inventor
郑吉平
马炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110770688.1A priority Critical patent/CN113448994B/en
Publication of CN113448994A publication Critical patent/CN113448994A/en
Application granted granted Critical
Publication of CN113448994B publication Critical patent/CN113448994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The invention discloses a continuous regret rate minimization query method based on a core set, which comprises the steps of firstly constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, then monitoring whether the database generates tuple changes, including the insertion of new tuples and the deletion of existing tuples, and maintaining and updating the core set and the result set for each tuple change, thereby achieving the purpose of efficiently finishing continuous regret rate minimization query under a non-static database. According to the method, a core set is constructed on an initial data set by introducing a core set method and utilizing nearest neighbor search, an unfortunate rate minimization query result set is calculated based on the core set, and along with the insertion and deletion of tuples in a database, the nearest neighbor relation and the core set are updated and a latest result set is calculated based on the nearest neighbor relation and the core set, so that the aim of continuously returning a series of real-time unfortunate rate minimization query result sets to a user under a non-static database is fulfilled.

Description

Continuous regrettage minimization query method based on core set
Technical Field
The invention belongs to the technical field of databases, and particularly relates to a continuous regret rate minimization query method based on a core set.
Background
The extraction of several representative tuples from a database is an important function in many applications such as multi-criteria decision making, recommendation systems and web search. Unfortunately, minimization of queries allows users to be satisfied as well as possible when faced with subsets as well as with the entire database by selecting subsets of a fixed size, which has been widely used in the related arts due to the advantages of controllable result size and no need for users to input complex information. However, as information technology develops, static databases are no longer practical, and many applications require the use of non-static databases in which data tuples are inserted and deleted. The problem of how to select the representative tuples is also faced in this type of application. For example, in a restaurant reservation system, the restaurant's business status, per-capita price, etc. may change over time, the restaurant's business and business closing may be regarded as the insertion and deletion of tuples in the database, while the change in per-capita price may be regarded as the modification of tuples, which may be regarded as the insertion of a deletion followed by an insertion. Obviously, at different time points, the representative restaurants returned for the user are certainly different, so how to continuously obtain the regressive minimized query result set from the dynamically-changing database to represent the real-time database becomes a problem to be solved urgently.
For continuous regrettable rate minimization query under a non-static database, the existing method [1] converts the query into a dynamic set coverage problem, firstly, the preferences of a plurality of users are obtained in a random sampling mode, if a certain tuple is a tuple with the highest score under a certain preference, the tuple covers the preferences or the preferences are covered by the tuple, a set system consisting of a base set and a subset set is constructed based on the preferences, wherein the base set is a randomly sampled preference set, the preference set covered by each tuple in the database forms one element in the subset set, and the original problem is converted into a set coverage problem, namely, a fixed number of elements are selected from the subset set, so that the union set of the elements is equal to the base set. According to the method, a certain number of iterative operations are carried out, one tuple is selected to be added into a result set every time, so that the total preference number covered by the result set after the tuple is added is larger than the total preference number covered by the result set after any other tuple is added, and the result set with a fixed size is finally obtained. And if the result set does not cover all the preferences obtained by sampling, randomly sampling the preferences with less number, repeating the process until a result set with fixed size and covering all the preferences is obtained, and returning the result to the user. After which the collection system and the result set are updated with the insertion or deletion of tuples in the database, which correspond to the insertion and deletion, respectively, of elements in the subset set of the collection system, and the result set is updated so that it is a set-covering solution of the new collection system. Therefore, the method can return a series of regrettably minimized query result sets along with the change of the tuples in the database. However, the method needs to update the aggregation system after each tuple change, which results in low efficiency of the method, and the method needs to store the entire aggregation system obtained after conversion for subsequent update, and the huge aggregation system occupies a large space resource.
The efficiency of the solution method has a great influence on the continuous regret rate minimum query under the non-static database, if the efficiency of the method is low, so that the previous change is not processed and completed when a new change occurs, congestion may occur and even the system is crashed, and high consumption of space resources has higher requirements on a hardware system for operating the method. The invention combines the continuous regret rate minimum query under the non-static database with the core set, omits partial update operation which does not influence the core set, efficiently solves the problem, only stores the information related to the core set, and occupies less space resources.
The documents mentioned above originate from the following articles:
[1]Yanhao Wang,Yuchen Li,Raymond Chi-Wing Wong,Kian-Lee Tan.A Fully Dynamic Algorithm for k-Regret Minimizing Sets.In Proceedings of the 37th International Conference on Data Engineering(ICDE),pages:1631-1642,2021.
disclosure of Invention
The invention aims to provide a continuous regret rate minimization query method based on a core set, which constructs the core set on an initial data set by introducing a core set method and utilizing nearest neighbor search, calculates a regret rate minimization query result set based on the core set, updates nearest neighbor relations and the core set along with the insertion and deletion of tuples in a database, and calculates a latest result set based on the core set, thereby realizing the aim of continuously returning a series of real-time regret rate minimization query result sets for users under an unsteady database with high efficiency.
In order to achieve the above purpose, the solution of the invention is:
a continuous regret rate minimization query method based on a core set comprises the following steps:
step 1, carrying out standardization processing on an original data set D with dimension D to enable attribute values of all tuples in the original data set D to be in a [0,1] interval;
step 2, constructing a core set C of original data;
step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core setpWherein N is a radius of
Figure BDA0003152836470000031
The d-dimensional space sphere is positioned on the surface in the non-negative image limit and randomly samples a set consisting of a plurality of points;
step 4, calculating an unfortunate query result set R;
step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise, executing step 6;
and 6, maintaining the core set and the regrettable minimization query result set.
The specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m1,m2,…,mdThen, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value
Figure BDA0003152836470000032
Wherein p [ i]Representing the attribute value on the ith dimension of the tuple p; when the attribute values of all tuples are 0,1]An interval.
The specific content of the step 2 is as follows:
2-1, order
Figure BDA0003152836470000033
2-2, in the sphere with the origin of coordinates as the center and the radius as the radius
Figure BDA0003152836470000034
The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first1,u2,…,udThen order
Figure BDA0003152836470000035
Then tuple (u)1,u2,…,ud) I.e. representing a radius of
Figure BDA0003152836470000036
The d-dimensional space sphere is positioned on the surface in the non-negative image limit to obtain a point by random sampling, the obtained point is added into a set N, the process is repeated for a plurality of times, and finally the radius of the obtained point is
Figure BDA0003152836470000037
The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;
2-3, respectively searching and recording the nearest neighbor NN (u) of all the points u in the N in the normalized data, namely the point which is closest to the Euclidean distance of the point u in the normalized initial data, namely the order
Figure BDA0003152836470000041
And adding the core set into the core set to obtain the core set C ═ Uu∈NNN(u)。
The specific process of the step 3 is as follows: for a certain point p, traversing all points U in N, judging whether the point p is the nearest neighbor of the point U, if so, adding U into U, namely p equals NN (U)pIn (1).
The specific process of the step 4 is as follows:
4-1, order
Figure BDA0003152836470000042
4-2, calculating the coverage value cov (R) of the result set as U of all points in the result setpThe size of the union of (a), i.e. cov (r) ═ | < u |p∈RUpI, |, cov (R) indicates how many points in R are nearest neighbors to N;
4-3, traversing all the points p in the core set, and finding out the points p ', so that the increment of the coverage value of the result set after the points p' are added is larger than the increment of the coverage value of the result set after any other points are added, namely:
p′=argmaxp∈Ccov(R∪p)-cov(R),
adding point p' to the result set;
4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;
and 4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user.
In step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript order.
The specific content of the step 6 is as follows:
6-1, respectively judging whether the core set needs maintenance according to two conditions of tuple insertion and tuple deletion, if so, turning to a step 6-2, and if not, turning to a step 5:
6-2, adjusting the core set and the U according to the new nearest neighbor relationpAnd then goes to step 4.
In the step 6, for the tuple insertion, the specific process is as follows:
a6-1, standardizing the tuple p, setting a set I and initializing the set I to be empty, adding u into the set I if the distance between u and the recorded nearest neighbor of u is greater than the distance between u and p for all points u in N, if I is not empty, maintaining the core set, turning to step A6-2, if I is empty, maintaining is not needed, and turning to step 5;
a6-2, insert tuple p into the core set and let UpFor all points u therein, the following treatments are respectively made: let q be NN (U) and let U be the next nearest neighbor of qqIn which U is removed and addedpJudgment of UqIf it is an empty set, and if so, removing q from the kernel set.
In the step 6, for the tuple deletion, the specific process is as follows:
b6-1, judging whether the tuple p is in the core set, if so, the core set needs to be maintained, turning to the step B6-2, otherwise, the tuple indicates that the maintenance is not needed, and turning to the step 5;
b6-2, removing p from the Kernel set, UpThe original nearest neighbors of all points in the list have been deleted, so that the following processing is respectively carried out for all points u in the list: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as UpIn which U is removed and addedqJudging whether q is already in the core set, if not, adding q into the core set.
After adopting the scheme, compared with the prior art, the invention has the following beneficial effects:
(1) the method for further updating the regret rate minimization query result set based on the maintenance of the core set omits part of updating operation which does not affect the core set, greatly reduces the average processing time consumption of each data change and can more efficiently complete continuous regret rate minimization query compared with the prior art that the set system needs to be updated for each change;
(2) in the prior art, a huge collection system needs to be stored in the processing process, and a large storage space is occupied, and relatively, the method has no requirement on the large storage space.
Drawings
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of a core set construction process for a 2-dimensional case;
the dots on the 1/4 circle in the figure represent the randomly sampled individual points u; points in the square frame represent each tuple standardized in the database; the connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point u is the end point NN (u); the dots with dotted circles represent the dots selected into the core set;
FIG. 3 is a flow diagram of computing a result set;
FIG. 4 is a diagram illustrating nearest neighbor relationships between phase points in an embodiment of the invention;
wherein u isiRepresents 5 randomly sampled points in the embodiment, which are located at the center of the circle with the origin and the radius
Figure BDA0003152836470000061
Figure BDA0003152836470000061
1/4 circles in the non-negative quadrant; p is a radical ofiRepresent the tuples normalized in the database because their coordinates are all at 0,1]In the interval, they are located in the square frame formed by the coordinate axis and the dotted line; the solid line connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point is the end point; the dotted line connecting line represents the deletion of the nearest neighbor relation caused by the tuple change; the dotted line connecting line represents the increase of the nearest neighbor relation caused by the current tuple change;
in fig. 4, (a) represents the nearest neighbor relation on the initial data set, and (b) represents the insertion p6Nearest neighbor relation on the latter data set, (c) denotes insertion p7Nearest neighbor relation on the latter data set, (d) represents deletion p4And p2Nearest neighbor relationships on the latter data set.
Detailed Description
The invention provides a continuous regret rate minimization query method based on a core set, which comprises the steps of firstly constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, then monitoring whether the database generates tuple changes, including the insertion of new tuples and the deletion of existing tuples, and maintaining and updating the core set and the result set for each tuple change, thereby achieving the purpose of efficiently finishing continuous regret rate minimization query under a non-static database.
As shown in fig. 1, the present invention comprises the steps of:
step 1, carrying out standardization processing on an original data set D with dimension D so that attribute values of all tuples in the original data set are within a [0,1] interval;
the specific operation of the step 1 is as follows: firstly, searching and recording the maximum attribute value in each dimension in all the tuples of the data set D and the variation sequence, and respectively recording the maximum attribute values as m1,m2,…,mdThen, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value
Figure BDA0003152836470000062
Figure BDA0003152836470000063
Wherein p [ i]Representing the attribute value of the ith dimension of the tuple p, and after conversion, the attribute values of all tuples are [0,1]]An interval;
step 2, constructing a core set C of the original data, and as shown in fig. 2, mainly including the following steps:
2-1, initializing the kernel set to an empty set, i.e. ordering
Figure BDA0003152836470000064
2-2, in the sphere with the origin of coordinates as the center and the radius as the radius
Figure BDA0003152836470000071
The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first1,u2,…,udThen order
Figure BDA0003152836470000072
Then tuple (u)1,u2,…,ud) I.e. representing a radius of
Figure BDA0003152836470000073
The d-dimensional space sphere is positioned on the surface in the non-negative image limit to obtain a point by random sampling, the obtained point is added into a set N, the process is repeated for a plurality of times, and finally the radius of the obtained point is
Figure BDA0003152836470000074
The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;
2-3, respectively searching and recording the nearest neighbor NN (u) of all the points u in the N in the normalized data, namely the point which is closest to the Euclidean distance of the point u in the normalized initial data, namely the order
Figure BDA0003152836470000075
And adding the core set into the core set to obtain the core set C ═ Uu∈NNN(u)。
Step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core setpN ^ p ═ N | u ∈ N ^ N (u) }; the specific process is as follows: for a certain point p, traversing all points U in N, judging whether the point p is the nearest neighbor of the point U according to the nearest neighbor relation recorded in the step 2-3, and if so, adding U into U, namely p equals NN (U)pIn (1). Each point in the core set corresponds to a UpSet, UpRepresenting the set of all points with the point p as the nearest neighbor in N;
step 4, calculating an unfortunate query result set R, and with reference to fig. 3, specifically including the following processes:
4-1, initializing the regretted query result set to an empty set, i.e., commanding
Figure BDA0003152836470000076
4-2, calculating the coverage value cov (R) of the result set as a knotU of all points in fruit setpThe size of the union of (a), i.e. cov (r) ═ | < u |p∈RUpI, |, cov (R) indicates how many points in R are nearest neighbors to N;
4-3, traversing all the points p in the core set, and finding out the points p ', so that the increment of the coverage value of the result set after the points p' are added is larger than the increment of the coverage value of the result set after any other points are added, namely:
p′=argmaxp∈Ccov(R∪p)-cov(R),
adding the point p' into the result set, wherein if a plurality of points obtain the maximum coverage value increment at the same time, the points are selected according to the subscript sequence, but when the number of random sampling points is large, namely the set N is large, the situation hardly occurs;
4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;
4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user;
step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise, executing step 6;
step 6, maintaining the core set and the regrettable minimization query result set, which specifically comprises:
6-1, judging whether the core set needs maintenance according to the condition of tuple insertion or tuple deletion, specifically analyzing according to the following two conditions, if the core set needs maintenance, executing the step 6-2, and if the core set does not need maintenance, turning to the step 5:
a) tuple p insertion: assigning the attribute value of each dimension of p as the original attribute value to be divided by the maximum attribute value of the corresponding dimension recorded in the step 1, namely normalizing p, then setting a set I and initializing the set I to be null, and adding u to the set I if the distance between u and the recorded nearest neighbor of the u is greater than the distance between u and p for all points u in N, namely | | | u-NN (u) | > | | u-p |, so that I represents a set of points affected by the nearest neighbor of the point due to the insertion of a tuple p, namely the point in I takes p as the new nearest neighbor of the point, if I is not null, the core set needs to be maintained, and if I is null, the maintenance is not needed;
b) tuple p deletion: if p is in the kernel set, then the kernel set needs maintenance, otherwise it means that maintenance is not needed.
6-2, adjusting the core set and the U according to the new nearest neighbor relationpThen go to step 4; wherein, the adjusted content performs different operations according to two situations of tuple insertion and deletion:
a) tuple p insertion: insert point p into the kernel set and let UpI, where I is the set I obtained in step 6-1, the new nearest neighbor of all points in I has become p, and for all points u, the following is performed: let q be NN (U) and let U be the next nearest neighbor of qqIn which U is removed and addedpJudgment of UqIf the core set is an empty set, removing q from the core set;
b) tuple p deletion: remove point p from the kernel set, UpThe original nearest neighbors of all points in the list have been deleted, so that the following processing is respectively carried out for all points u in the list: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as UpIn which U is removed and addedqJudging whether q is already in the core set, if not, adding q into the core set.
The technical solution of the present invention will be described in detail by a specific example.
The present embodiment assumes that the data dimension is equal to 2, i.e., d is 2, and the result set size required by the user is 2. First assume that the initial database contains the tuple p1,p2,p3,p4,p5The specific attribute value information is as follows:
TABLE 1
Figure BDA0003152836470000091
Suppose that the change in the database is four times in total and in turn, insert p as follows6(90,105) insertion of p7(160,30), deleting p4Deletion of p2
Step (1), searching the maximum value of the attribute value on each dimension in all the tuples of the initial database and the variation sequence to obtain m1=p7[1]=160,m2=p2[2]150, assigning the attribute value in each dimension of all tuples to the original attribute value divided by the maximum attribute value in the corresponding dimension of the record, i.e. p1[1]=100/160=0.625,p2[1]45/150, 0.3, etc., which are not described in detail. The normalized attribute value information for each tuple is then as follows:
TABLE 2
Figure BDA0003152836470000092
Step (2-1), let
Figure BDA0003152836470000093
Step (2-2), acquiring d-2 random numbers, and assuming that u is obtained1=71,u2When it is equal to 29, it gives
Figure BDA0003152836470000101
This is repeated several times (in this embodiment, it is assumed that the number is 5), and it is assumed that the related information of each point in the finally obtained set N is as follows:
TABLE 3
Figure BDA0003152836470000102
Step (2-3), | | u1-p1||=1.814,||u1-p2||=2.498,||u1-p3||=2.085,||u1-p4||=1.955,||u1-p5Comparing the values of 2.273, wherein p is1Distance u1Recently, therefore u1Is NN (u)1)=p1A 1 is to p1Add the kernel set and similarly, the distance information between points is as follows:
TABLE 4
Figure BDA0003152836470000103
Yield NN (u)2)=p2,NN(u3)=p3,NN(u4)=p1,NN(u5)=p2The points and their neighbors are shown in fig. 4 (a). The nearest neighbors of each point at this time were recorded as follows:
TABLE 5
Figure BDA0003152836470000104
Thus the core set C ═ p1,p2,p3}。
Step (3) because of p1=NN(u1),p1=NN(u4) So that U isp1={u1,u4Analogously, obtaining Up1={u2,u5},Up3={u3At this time, the kernel concentrates on each point UpThe information is as follows:
TABLE 6
Figure BDA0003152836470000111
Step (4-1), let
Figure BDA0003152836470000112
And (4-2) calculating cov (r) 0.
Step (4-3), cov (R. U.p.)1)-cov(R)=|Up1|-0=2,cov(R∪p2)-cov(R)=|Up2|-0=2,cov(R∪p3)-cov(R)=|U p31, when p1And p2All simultaneously achieve maximum coverageThe value is increased by an amount. Thus, p is chosen in order of subscript1Add result set when R ═ p1}。
Step (4-4), where the size of R is 1 and the user-specified result set size is 2, so steps (4-2) to (4-3) are repeated.
Step (4-2) of calculating cov (r) ═ Up1|=2。
Step (4-3), cov (R. U.p.)1)-cov(R)=|Up1∪Up1|-|Up1|=2-2=0,cov(R∪p2)-cov(R)=|Up1∪Up2|-|Up1|=4-2=2,cov(R∪p3)-cov(R)=|Up1∪Up3|-|U p11, so that p is equal to2Add result set when R ═ p1,p2}。
And (4-4), when the size of R is 2 and is equal to the size of the result set specified by the user, the step (4-5) is carried out.
And (4-5) recording and returning the regrettability minimization query result set R ═ { p ═ at the moment1,p2}。
Step (5), the changed insertion p occurs in the database6(90,105), executing the step (6-1).
Step (6-1) of adding p6Normalizing the attribute value to obtain p6(90/160,105/150), i.e. p6(0.563,0.7)。p6Distances from each point in N are as follows:
TABLE 7
Figure BDA0003152836470000113
Because | | | u1-NN(u1)||=||u1-p1||=1.848<||u1-p61.979, so u1Without the addition of I, the reaction mixture was, similarly,
||u2-NN(u2)||=||u2-p2||=1.420<||u2-p6||=1.804,u2not adding into I;
||u3-NN(u3)||=||u3-p3||=1.565>||u3-p6||=1.524,u3adding the mixture into the solution I;
||u4-NN(u4)||=||u4-p1||=1.722>||u4-p6||=1.685,u4adding the mixture into the solution I;
||u5-NN(u5)||=||u5-p2||=1.464<||u5-p6||=1.573,u5no addition was made to I. Thus, I ═ u3,u4And f, if not, performing kernel set maintenance, and executing the step (6-2).
Step (6-2) of adding p6Adding a kernel set and ordering Up6=I={u3,u4}; operate separately for each point in I, where for u3,q=NN(u3)=p3Will u3Slave Up3Is removed, at this time Up3For an empty set, p is3Removing from the core set; for u4,q=NN(u4)=p1Will u4Slave Up1Is removed, at this time Up1={u1And not an empty set. At this point, the core concentrates the U of each pointpThe information is as follows:
TABLE 8
Figure BDA0003152836470000121
The relationship between the points and their neighbors is shown in FIG. 4 (b). The nearest neighbors of each point after updating at this time are as follows:
TABLE 9
Figure BDA0003152836470000122
Then, the process proceeds to step (4 a).
The execution process of step (4a) is similar to that of step (4), and is not repeated, and the regressive-minimization query result set R ═ p at this time is returned2,p6}。
(thus far, the first change processing was completed)
Step (5a), inserting p with variation in database7(160,30), executing the step (6-1 a).
Step (6-1a) of adding p7Normalizing the attribute value to obtain p7(160/160,30/150), i.e. p7(1,0.2)。p7Distances from each point in N are as follows:
watch 10
Figure BDA0003152836470000131
Because | | | u1-NN(u1)||=||u1-p1||=1.848>||u1-p71.428, so u1The addition of the compound to I, similarly,
||u2-NN(u2)||=||u2-p2||=1.420<||u2-p7||=2.429,u2not adding into I;
||u3-NN(u3)||=||u3-p6||=1.524<||u3-p7||=1.665,u3not adding into I;
||u4-NN(u4)||=||u4-p6||=1.685>||u4-p7||=1.425,u4adding the mixture into the solution I;
||u5-NN(u5)||=||u5-p2||=1.464<||u5-p7||=2.036,u5no addition was made to I. Thus, I ═ u1,u4And if not, performing kernel set maintenance, and executing the step (6-2 a).
Step (6-2a) of adding p7Add kernel set and order Up7=I={u1,u4}; operate on each point in I separately for u1,q=NN(u1)=p1Will u1Slave Up1Is removed, at this time Up1For an empty set, p is1Removing from the core set; for u4,q=NN(u4)=p6Will u4Slave Up6Is removed, at this time Up6={u3And not an empty set. At this point, the core concentrates the U of each pointpThe information is as follows:
TABLE 11
Figure BDA0003152836470000132
The relationship between the points and their neighbors is shown in FIG. 4 (c). The nearest neighbor of the updated point at this time is as follows:
TABLE 12
Figure BDA0003152836470000133
And then, the step (4b) is carried out.
Step (4b) is executed (same as step (4) above), and the regressive rate minimization query result set R ═ p at this time is returned2,p7}。
(this second change processing is completed.)
Step (5b), deleting p with change in database4And (6b) executing.
Step (6-1b) because p4If not, step (5c) is performed without performing kernel set maintenance.
(thus far, the third changing treatment was completed)
Step (5c), deleting p with change in database2And (6) executing the step (6-1 c).
Step (6-1c), p2In the kernel set, kernel set maintenance is required, and step (6-2c) is performed.
Step (6-2c) of adding p2Removing, U from the kernel setp2={u2,u5And for each point, searching the nearest neighbor again, wherein the distance information of the relevant point is as follows:
watch 13
Figure BDA0003152836470000141
Thus for u2Searching its nearest neighbor in the latest data set to obtain NN (u)2)=p5Due to p5Is not in core set, so p will be5Adding into the core set, and adding u2Adding Up5Performing the following steps; for u5Searching its nearest neighbor, NN (u), in the latest dataset5)=p5At this time p5Has been in the core set (for u)2Already added at the time of operation), u is added5Adding Up5In (1). At this point, the core concentrates the U of each pointpThe information is as follows:
TABLE 14
Figure BDA0003152836470000142
The relationship between the points and their neighbors is shown in FIG. 4 (d). The nearest neighbors of each point after updating at this time are as follows:
watch 15
Figure BDA0003152836470000143
And then, the step (4d) is carried out.
Step (4d) is executed (same as step (4) above), and the regressive rate minimization query result set R ═ p at this time is returned5,p7}。
(this fourth alternation process is completed.)
And (5d), the database is not changed any more, and the method is ended.
In summary, the continuous regret rate minimization query method based on the core set of the present invention considers the influence of the insertion and deletion of tuples in the database on the regret rate minimization query result, constructs the core set based on the nearest neighbor search, and performs regret rate minimization query by using the core set and the maximum coverage method. On the basis of obtaining the initial core set and the initial regret minimization query result set, aiming at the condition that the insertion and deletion of the tuple in the database may change the core set and the regret minimization query result set, once the insertion or deletion of the tuple occurs in the database, whether the core set needs to be changed is judged, and the maintenance of the core set and the updating of the regret minimization query result set are triggered to meet the requirement of continuous regret minimization query. The invention utilizes the nearest neighbor relation in the core set to rapidly adjust the core set, thereby effectively reducing the maintenance time and improving the efficiency of continuous regret rate minimum query under the non-static database.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (9)

1. A continuous regret rate minimization query method based on a core set is characterized by comprising the following steps:
step 1, carrying out standardization processing on an original data set D with dimension D to enable attribute values of all tuples in the original data set D to be in a [0,1] interval;
step 2, constructing a core set C of original data;
step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core setpWherein N is a radius of
Figure FDA0003152836460000011
The d-dimensional space sphere is positioned on the surface in the non-negative image limit and randomly samples a set consisting of a plurality of points;
step 4, calculating an unfortunate query result set R;
step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise, executing step 6;
and 6, maintaining the core set and the regrettable minimization query result set.
2. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m1,m2,…,mdThen, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value
Figure FDA0003152836460000012
p[i]=p[i]/miWherein p [ i ]]Representing the attribute value on the ith dimension of the tuple p; when the attribute values of all tuples are 0,1]An interval.
3. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 2 is as follows:
2-1, order
Figure FDA0003152836460000013
2-2, in the sphere with the origin of coordinates as the center and the radius as the radius
Figure FDA0003152836460000014
The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first1,u2,…,udThen order
Figure FDA0003152836460000015
Then tuple (u)1,u2,…,ud) I.e. representing a radius of
Figure FDA0003152836460000021
D-dimensional space ofRandomly sampling a point on the surface of the sphere within the non-negative image limit, adding the point into a set N, repeating the process for several times to obtain the sphere with the radius of
Figure FDA0003152836460000022
The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;
2-3, respectively searching and recording the nearest neighbor NN (u) of all the points u in the N in the normalized data, namely the point which is closest to the Euclidean distance of the point u in the normalized initial data, namely the order
Figure FDA0003152836460000023
And adding the core set into the core set to obtain the core set C ═ Uu∈NNN(u)。
4. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific process of the step 3 is as follows: for a certain point p, traversing all points U in N, judging whether the point p is the nearest neighbor of the point U, if so, adding U into U, namely p equals NN (U)pIn (1).
5. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific process of the step 4 is as follows:
4-1, order
Figure FDA0003152836460000024
4-2, calculating the coverage value cov (R) of the result set as U of all points in the result setpThe size of the union of (a), i.e. cov (r) ═ | < u |p∈RUpI, |, cov (R) indicates how many points in R are nearest neighbors to N;
4-3, traversing all the points p in the core set, and finding out the points p ', so that the increment of the coverage value of the result set after the points p' are added is larger than the increment of the coverage value of the result set after any other points are added, namely:
p′=argmaxp∈Ccov(R∪p)-cov(R),
adding point p' to the result set;
4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;
and 4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user.
6. The kernel set-based successive regressive rate minimization query method of claim 5, wherein: in step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript sequence.
7. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 6 is as follows:
6-1, respectively judging whether the core set needs maintenance according to two conditions of tuple insertion and tuple deletion, if so, turning to a step 6-2, and if not, turning to a step 5:
6-2, adjusting the core set and the U according to the new nearest neighbor relationpAnd then goes to step 4.
8. The kernel set-based successive regressive rate minimization query method of claim 7, wherein: in step 6, for the case of tuple insertion, the specific process is as follows:
a6-1, standardizing the tuple p, setting a set I and initializing the set I to be empty, adding u into the set I if the distance between u and the recorded nearest neighbor of u is greater than the distance between u and p for all points u in N, if I is not empty, maintaining the core set, turning to step A6-2, if I is empty, maintaining is not needed, and turning to step 5;
a6-2, insert tuple p into the core set and let UpFor all points u therein, the following treatments are respectively made: let q be NN (U) and let U be the next nearest neighbor of qqIn which U is removed and addedpJudgment of UqIf it is an empty set, and if so, removing q from the kernel set.
9. The kernel set-based successive regressive rate minimization query method of claim 7, wherein: in step 6, for the case of tuple deletion, the specific process is as follows:
b6-1, judging whether the tuple p is in the core set, if so, the core set needs to be maintained, turning to the step B6-2, otherwise, the tuple indicates that the maintenance is not needed, and turning to the step 5;
b6-2, removing p from the Kernel set, UpThe original nearest neighbors of all points in the list have been deleted, so that the following processing is respectively carried out for all points u in the list: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as UpIn which U is removed and addedqJudging whether q is already in the core set, if not, adding q into the core set.
CN202110770688.1A 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set Active CN113448994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110770688.1A CN113448994B (en) 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110770688.1A CN113448994B (en) 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set

Publications (2)

Publication Number Publication Date
CN113448994A true CN113448994A (en) 2021-09-28
CN113448994B CN113448994B (en) 2023-02-03

Family

ID=77815406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110770688.1A Active CN113448994B (en) 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set

Country Status (1)

Country Link
CN (1) CN113448994B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024077646A1 (en) * 2022-10-10 2024-04-18 深圳计算科学研究院 Incremental query method based on linear programming

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649489A (en) * 2016-09-28 2017-05-10 南京航空航天大学 Continuous skyline query processing mechanism in geographic text information data
CN108932251A (en) * 2017-05-25 2018-12-04 郑州大学 A kind of k- on the frequent updating data set based on sequence dominates search algorithm Skyline
US10200814B1 (en) * 2018-04-24 2019-02-05 The Florida International University Board Of Trustees Voronoi diagram-based algorithm for efficient progressive continuous k-nearest neighbor query for moving objects
CN112691383A (en) * 2021-01-14 2021-04-23 上海交通大学 Texas poker AI training method based on virtual regret minimization algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649489A (en) * 2016-09-28 2017-05-10 南京航空航天大学 Continuous skyline query processing mechanism in geographic text information data
CN108932251A (en) * 2017-05-25 2018-12-04 郑州大学 A kind of k- on the frequent updating data set based on sequence dominates search algorithm Skyline
US10200814B1 (en) * 2018-04-24 2019-02-05 The Florida International University Board Of Trustees Voronoi diagram-based algorithm for efficient progressive continuous k-nearest neighbor query for moving objects
CN112691383A (en) * 2021-01-14 2021-04-23 上海交通大学 Texas poker AI training method based on virtual regret minimization algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ABOLFAZL ASUDEH等: "Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative", 《SIGMOD "17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA》 *
DANUPON NANONGKAI等: "Regret-minimizing representative databases", 《PROCEEDINGS OF THE VLDB ENDOWMENT》 *
QI DONG等: "Faster Algorithms for k-Regret Minimizing Sets via Monotonicity and Sampling", 《CIKM "19: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》 *
YANHAO WANG等: "A Fully Dynamic Algorithm for k-Regret Minimizing Sets", 《HTTPS://ARXIV.ORG/PDF/2005.14493.PDF》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024077646A1 (en) * 2022-10-10 2024-04-18 深圳计算科学研究院 Incremental query method based on linear programming

Also Published As

Publication number Publication date
CN113448994B (en) 2023-02-03

Similar Documents

Publication Publication Date Title
JP6870071B2 (en) Table incremental clustering maintenance
US10102253B2 (en) Minimizing index maintenance costs for database storage regions using hybrid zone maps and indices
US10706103B2 (en) System and method for hierarchical distributed processing of large bipartite graphs
US8407164B2 (en) Data classification and hierarchical clustering
CN110134714B (en) Distributed computing framework cache index method suitable for big data iterative computation
US10754887B1 (en) Systems and methods for multimedia image clustering
US9721007B2 (en) Parallel data sorting
Palpanas Evolution of a Data Series Index: The iSAX Family of Data Series Indexes: iSAX, iSAX2. 0, iSAX2+, ADS, ADS+, ADS-Full, ParIS, ParIS+, MESSI, DPiSAX, ULISSE, Coconut-Trie/Tree, Coconut-LSM
WO2017096892A1 (en) Index construction method, search method, and corresponding device, apparatus, and computer storage medium
WO2020057272A1 (en) Index data storage and retrieval methods and apparatuses, and storage medium
CN109829066B (en) Local sensitive Hash image indexing method based on hierarchical structure
CN111552710B (en) Query optimization method for distributed database
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
Tang et al. Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce.
Cuzzocrea et al. Approximate range–sum query answering on data cubes with probabilistic guarantees
US10642918B2 (en) Efficient publish/subscribe systems
CN110334290B (en) MF-Octree-based spatio-temporal data rapid retrieval method
CN113448994B (en) Continuous regrettage minimization query method based on core set
Huang et al. A clustering based approach for skyline diversity
WO2017095413A1 (en) Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors
CN108549696B (en) Time series data similarity query method based on memory calculation
JP2010277329A (en) Neighborhood retrieval device
CN110209895B (en) Vector retrieval method, device and equipment
CN112162986B (en) Parallel top-k range skyline query method and system
US8666972B2 (en) System and method for content management and determination of search conditions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant