CN113448994B - Continuous regrettage minimization query method based on core set - Google Patents

Continuous regrettage minimization query method based on core set Download PDF

Info

Publication number
CN113448994B
CN113448994B CN202110770688.1A CN202110770688A CN113448994B CN 113448994 B CN113448994 B CN 113448994B CN 202110770688 A CN202110770688 A CN 202110770688A CN 113448994 B CN113448994 B CN 113448994B
Authority
CN
China
Prior art keywords
points
tuple
core set
core
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110770688.1A
Other languages
Chinese (zh)
Other versions
CN113448994A (en
Inventor
郑吉平
马炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110770688.1A priority Critical patent/CN113448994B/en
Publication of CN113448994A publication Critical patent/CN113448994A/en
Application granted granted Critical
Publication of CN113448994B publication Critical patent/CN113448994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The invention discloses a continuous regret rate minimization query method based on a core set, which comprises the steps of constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, monitoring whether the database generates tuple change or not, including the insertion of a new tuple and the deletion of an existing tuple, and maintaining and updating the core set and the result set for each tuple change, so as to achieve the aim of efficiently finishing continuous regret rate minimization query under a non-static database. According to the method, a core set is constructed on an initial data set by introducing a core set method and utilizing nearest neighbor search, an unfortunate rate minimization query result set is calculated based on the core set, and along with the insertion and deletion of tuples in a database, the nearest neighbor relation and the core set are updated and a latest result set is calculated based on the nearest neighbor relation and the core set, so that the aim of continuously returning a series of real-time unfortunate rate minimization query result sets to a user under a non-static database is fulfilled.

Description

Continuous regrettage minimization query method based on core set
Technical Field
The invention belongs to the technical field of databases, and particularly relates to a continuous regret rate minimization query method based on a core set.
Background
The extraction of several representative tuples from a database is an important function in many applications such as multi-criteria decision making, recommendation systems and web search. Unfortunately, minimization of queries allows users to be satisfied as well as possible when confronted with subsets as with the entire database by selecting subsets of a fixed size, which has been widely used in the related arts due to the advantages of controllable result size and no need for users to input complex information. However, as information technology develops, static databases are no longer practical, and many applications require the use of non-static databases in which data tuples are inserted and deleted. The problem of how to select the representative tuples is also faced in this type of application. For example, in a restaurant reservation system, the restaurant's business status, per-capita price, etc. may change over time, the restaurant's business and business closing may be regarded as the insertion and deletion of tuples in the database, while the change in per-capita price may be regarded as the modification of tuples, which may be regarded as the insertion of a deletion followed by an insertion. Obviously, at different time points, the representative restaurants returned for the user are definitely different, so that how to continuously acquire the regret rate minimized query result set from the dynamically-changed database to represent the real-time database becomes a problem to be solved urgently.
For continuous regrettable rate minimization query under a non-static database, the existing method [1] converts the query into a dynamic set coverage problem, firstly, the preferences of a plurality of users are obtained in a random sampling mode, if a certain tuple is a tuple with the highest score under a certain preference, the tuple covers the preferences or the preferences are covered by the tuple, a set system consisting of a base set and a subset set is constructed based on the preferences, wherein the base set is a randomly sampled preference set, the preference set covered by each tuple in the database forms one element in the subset set, and the original problem is converted into a set coverage problem, namely, a fixed number of elements are selected from the subset set, so that the union set of the elements is equal to the base set. According to the method, a certain number of iterative operations are carried out, one tuple is selected to be added into a result set every time, so that the total preference number covered by the result set after the tuple is added is larger than the total preference number covered by the result set after any other tuple is added, and the result set with a fixed size is finally obtained. And if the result set does not cover all the preferences obtained by sampling, randomly sampling the preferences with less number, repeating the process until a result set with fixed size and covering all the preferences is obtained, and returning the result to the user. After which the collection system and the result set are updated with the insertion or deletion of tuples in the database, which correspond to the insertion and deletion, respectively, of elements in the subset set of the collection system, and the result set is updated so that it is a set-covering solution of the new collection system. Therefore, the method can return a series of regrettably minimized query result sets along with the change of the tuples in the database. However, the method needs to update the aggregation system after each tuple change, which results in low efficiency of the method, and the method needs to store the entire aggregation system obtained after conversion for subsequent update, and the huge aggregation system occupies a large space resource.
The efficiency of the solution has a great influence on the continuous regret rate minimization query under the non-static database, and if the efficiency of the method is low, so that the previous change is not processed and completed when a new change occurs, congestion may occur and even the system is crashed, and high consumption of space resources also has a high requirement on a hardware system for operating the method. The invention combines continuous regret rate minimum query under the non-static database with the core set, omits partial update operation which does not affect the core set, efficiently solves the problem, only stores the information related to the core set, and occupies less space resources.
The documents mentioned above are derived from the following articles:
[1]Yanhao Wang,Yuchen Li,Raymond Chi-Wing Wong,Kian-Lee Tan.A Fully Dynamic Algorithm for k-Regret Minimizing Sets.In Proceedings of the 37th International Conference on Data Engineering(ICDE),pages:1631-1642,2021.
disclosure of Invention
The invention aims to provide a continuous regret rate minimization query method based on a core set, which constructs the core set on an initial data set by introducing a core set method and utilizing nearest neighbor search, calculates a regret rate minimization query result set based on the core set, updates nearest neighbor relations and the core set along with the insertion and deletion of tuples in a database, and calculates a latest result set based on the core set, thereby realizing the aim of continuously returning a series of real-time regret rate minimization query result sets for users under an unsteady database with high efficiency.
In order to achieve the above purpose, the solution of the invention is:
a continuous regret rate minimization query method based on a core set comprises the following steps:
step 1, carrying out standardization processing on an original data set D with a dimension D, so that attribute values of all tuples in the original data set D are in a [0,1] interval;
step 2, constructing a core set C of original data;
step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core set p Wherein N is a radius of
Figure BDA0003152836470000031
The d-dimensional space sphere is positioned on the surface in the non-negative image limit and randomly samples a set consisting of a plurality of points;
step 4, calculating an unfortunate query result set R;
step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise, executing step 6;
and 6, maintaining the core set and the regrettable minimization query result set.
The specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m 1 ,m 2 ,…,m d Then, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value
Figure BDA0003152836470000032
Wherein p [ i]Representing the attribute values in the ith dimension of the tuple p; when all the tuples have attribute values of 0,1]An interval.
The specific content of the step 2 is as follows:
2-1, order
Figure BDA0003152836470000033
2-2, in the sphere with the origin of coordinates as the center and the radius as the radius
Figure BDA0003152836470000034
The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first 1 ,u 2 ,…,u d Then order
Figure BDA0003152836470000035
Then tuple (u) 1 ,u 2 ,…,u d ) I.e. representing a radius of
Figure BDA0003152836470000036
The d-dimensional space sphere is positioned on the surface in the non-negative image limit to obtain a point by random sampling, the obtained point is added into a set N, the process is repeated for a plurality of times, and finally the radius of the obtained point is
Figure BDA0003152836470000037
The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;
2-3, respectively searching and recording the nearest neighbor NN (u) of all the points u in the N in the normalized data, namely the point with the Euclidean distance to the point u in the normalized initial data, namely the order
Figure BDA0003152836470000041
And adding the core set into the core set to finally obtain a core set C = { (U) } u∈N NN(u)。
The specific process of the step 3 is as follows: for a certain point p, all points U in N are traversed, whether the point p is the nearest neighbor of the point U is judged, if yes, p = NN (U), U is added into U p In (1).
The specific process of the step 4 is as follows:
4-1, order
Figure BDA0003152836470000042
4-2, calculating the coverage value cov (R) of the result set as U of all points in the result set p The size of the union of (1), i.e. cov (R) = | < U | p∈R U p I, cov (R) indicates how many points in R are nearest neighbors to;
4-3, traversing all points p in the core set, and finding out a point p ', so that the increment of the coverage value of the result set after the point p' is added is larger than the increment of the coverage value of the result set after any other point is added, namely:
p′=argmax p∈C cov(R∪p)-cov(R),
adding point p' to the result set;
4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;
and 4-5, recording the regret rate minimization query result set obtained in the step 4-4, and returning the regret rate minimization query result set to the user.
In step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript order.
The specific content of the step 6 is:
6-1, respectively judging whether the core set needs maintenance according to two conditions of tuple insertion and tuple deletion, if so, turning to a step 6-2, and if not, turning to a step 5:
6-2, adjusting the core set and the U according to the new nearest neighbor relation p And then goes to step 4.
In the step 6, for the tuple insertion, the specific process is as follows:
a6-1, standardizing the tuple p, setting a set I and initializing the set I to be empty, adding u into the set I if the distance between u and the recorded nearest neighbor of u is greater than the distance between u and p for all points u in N, maintaining the core set if the I is not empty, turning to the step A6-2, maintaining the core set if the I is empty, and turning to the step 5;
a6-2, inserting tuple p into core setIn and let U p = I, for all points u therein, the following is done respectively: let q = NN (U) and let U be the next nearest neighbor of q q In which U is removed and added p Determine U q If it is an empty set, and if so, removing q from the kernel set.
In the step 6, for the case of tuple deletion, the specific process is as follows:
b6-1, judging whether the tuple p is in the core set, if so, switching to the step B6-2 if the core set needs to be maintained, otherwise, switching to the step 5 if the maintenance is not needed;
b6-2, removing p from the core set, U p The original nearest neighbors of all the points in the list are deleted, so that the following processing is respectively carried out on all the points u: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as U p In which U is removed and added q Judging whether q is already in the core set, if not, adding q into the core set.
After adopting the scheme, compared with the prior art, the invention has the following beneficial effects:
(1) The method for further updating the regret rate minimization query result set based on the maintenance of the core set omits part of updating operation which does not affect the core set, greatly reduces the average processing time consumption of each data change and can more efficiently complete continuous regret rate minimization query compared with the prior art that the set system needs to be updated for each change;
(2) In the prior art, a huge integrated system needs to be stored in the processing process, and the integrated system occupies a larger storage space, and relatively, the integrated system has no requirement on the larger storage space.
Drawings
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of a core set construction process for a 2-dimensional case;
points on the 1/4 circle in the figure represent points u obtained by random sampling; points in the square frame represent each tuple standardized in the database; the connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point u is the end point NN (u); the dots with dotted circles represent the dots selected into the core set;
FIG. 3 is a flow diagram of computing a result set;
FIG. 4 is a diagram illustrating nearest neighbor relationships between phase points in an embodiment of the invention;
wherein u is i Represents 5 randomly sampled points in the embodiment, which are located around the origin and have a radius of
Figure BDA0003152836470000061
On the 1/4 circle in the non-negative quadrant; p is a radical of i Represent the tuples normalized in the database because their coordinates are all at 0,1]In the interval, they are located in the square frame formed by the coordinate axis and the dotted line; the solid line connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point is the end point; the dotted line connecting line represents the deletion of the nearest neighbor relation caused by the tuple change; the dotted line connecting line represents the increase of the nearest neighbor relation caused by the current tuple change;
in fig. 4, (a) represents the nearest neighbor relation on the initial data set, and (b) represents the insertion p 6 Nearest neighbor relation on the latter data set, (c) denotes insertion p 7 Nearest neighbor relation on the latter data set, (d) represents deletion p 4 And p 2 Nearest neighbor relationships on the latter data set.
Detailed Description
The invention provides a continuous regret rate minimization query method based on a core set, which comprises the steps of firstly constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, then monitoring whether the database generates tuple changes, including the insertion of new tuples and the deletion of existing tuples, and maintaining and updating the core set and the result set for each tuple change, thereby achieving the purpose of efficiently finishing continuous regret rate minimization query under a non-static database.
As shown in fig. 1, the present invention comprises the steps of:
step 1, carrying out standardization processing on an original data set D with a dimension D so as to enable attribute values of all tuples in the original data set to be in a [0,1] interval;
the specific operation of the step 1 is as follows: firstly, searching and recording the data set D and the maximum attribute value in each dimension in all the tuples of the variation sequence, and respectively marking as m 1 ,m 2 ,…,m d Then, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value
Figure BDA0003152836470000062
Figure BDA0003152836470000063
Wherein p [ i]Representing the attribute value of the ith dimension of the tuple p, and after conversion, the attribute values of all tuples are [0,1]]An interval;
step 2, constructing a core set C of the original data, and as shown in fig. 2, mainly including the following steps:
2-1, initializing the kernel set to an empty set, i.e. ordering
Figure BDA0003152836470000064
2-2, in the sphere with the origin of coordinates as the center and the radius as the radius
Figure BDA0003152836470000071
The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first 1 ,u 2 ,…,u d Then order
Figure BDA0003152836470000072
Then tuple (u) 1 ,u 2 ,…,u d ) I.e. representing a radius of
Figure BDA0003152836470000073
Is located on a surface within a non-negative image limit, randomly sampling the obtained point, adding the obtained point to a set N, and repeating the stepThe process is repeated for several times to finally obtain the product with the radius of
Figure BDA0003152836470000074
The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;
2-3, respectively searching and recording the nearest neighbor NN (u) of all points u in the normalized data, namely, the point with the Euclidean distance to the point u in the normalized initial data, namely, enabling all the points u in the N to be respectively searched and recorded
Figure BDA0003152836470000075
And adding the core set into the core set to finally obtain the core set C = $ U u∈N NN(u)。
Step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core set p = { u | u ∈ N ^ p = NN (u) }; the specific process is as follows: for a certain point p, all points U in N are traversed, whether the point p is the nearest neighbor of the point U is judged according to the nearest neighbor relation recorded in the step 2-3, if yes, namely p = NN (U), U is added into U p In (1). Each point in the core set corresponds to a U p Set, U p Representing the set of all points with the point p as the nearest neighbor in N;
step 4, calculating an regrettable query result set R, and referring to fig. 3, specifically including the following processes:
4-1, initializing the regret query result set as an empty set, namely, commanding
Figure BDA0003152836470000076
4-2, calculating the coverage value cov (R) of the result set as U of all points in the result set p I.e. cov (R) = | u p∈R U p I, cov (R) indicates how many points in R are nearest neighbors to;
4-3, traversing all points p in the core set, and finding out a point p ', so that the increment of the coverage value of the result set after the point p' is added is larger than the increment of the coverage value of the result set after any other point is added, namely:
p′=argmax p∈C cov(R∪p)-cov(R),
adding the point p' into the result set, wherein if a plurality of points obtain the maximum coverage value increment at the same time, the points are selected according to the subscript sequence, but when the number of random sampling points is large, namely the set N is large, the situation hardly occurs;
4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;
4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user;
step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise, executing step 6;
step 6, maintaining the core set and the regrettable minimization query result set, which specifically comprises:
6-1, judging whether the core set needs maintenance according to the condition of tuple insertion or tuple deletion, specifically analyzing according to the following two conditions, if the core set needs maintenance, executing the step 6-2, and if the core set does not need maintenance, turning to the step 5:
a) Tuple p insertion: assigning the attribute value of each dimension of p as the original attribute value to be divided by the maximum attribute value of the corresponding dimension recorded in the step 1, namely normalizing p, then setting a set I and initializing the set I to be null, and adding u to the set I if the distance between u and the recorded nearest neighbor of the u is greater than the distance between u and p for all points u in N, namely | | | u-NN (u) | > | | u-p |, so that I represents a set of points influenced by the nearest neighbor of the point due to insertion of a tuple p, namely the point in I takes p as the new nearest neighbor of the point, if I is not null, the core set needs to be maintained, and if I is null, the maintenance is not needed;
b) Tuple p deletion: if p is in the kernel set, then the kernel set needs maintenance, otherwise it means that maintenance is not needed.
6-2, adjusting the core set and the U according to the new nearest neighbor relation p Then go to step 4; wherein, the adjusted content performs different operations according to two situations of tuple insertion and deletion:
a) Tuple p insertion: insert point p into the kernel set and let U p = I, where I is the set I obtained in step 6-1, the new nearest neighbor of all points in I has become p, and the following is performed for all points u: let q = NN (U) and let U be the next to U q In which U is removed and added p Judgment of U q If the core set is an empty set, if so, q is removed from the core set;
b) Tuple p deletion: remove point p from the core set, U p The original nearest neighbors of all points in the list have been deleted, so that the following processing is respectively carried out for all points u in the list: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as U p In which U is removed and added q Judging whether q is already in the core set, if not, adding q into the core set.
The technical solution of the present invention will be described in detail by a specific example.
The present embodiment assumes that the data dimension is equal to 2, i.e. d =2, and the result set size required by the user is 2. First assume that the initial database contains the tuple p 1 ,p 2 ,p 3 ,p 4 ,p 5 The specific attribute value information is as follows:
TABLE 1
Figure BDA0003152836470000091
Suppose that the change in the database is four times in total and in turn, insert p as follows 6 (90, 105) inserting p 7 (160, 30), deleting p 4 Deleting p 2
Step (1), searching the maximum value of the attribute value on each dimension in all the tuples of the initial database and the variation sequence to obtain m 1 =p 7 [1]=160,m 2 =p 2 [2]=150, assign attribute values on each dimension of all tuples to original attribute values divided by corresponding dimension of recordThe attribute value with the greatest degree, i.e. p 1 [1]=100/160=0.625, p 2 [1]=45/150=0.3, etc., which will not be described in detail. The normalized attribute value information for each tuple is then as follows:
TABLE 2
Figure BDA0003152836470000092
Step (2-1), let
Figure BDA0003152836470000093
Step (2-2), d =2 random numbers are obtained, and it is assumed that u is obtained 1 =71,u 2 =29, then order
Figure BDA0003152836470000101
This is repeated several times (in this embodiment, it is assumed that the number is 5), and the finally obtained related information of each point in the set N is assumed as the following table:
TABLE 3
Figure BDA0003152836470000102
Step (2-3), | | u 1 -p 1 ||=1.814,||u 1 -p 2 ||=2.498,||u 1 -p 3 ||=2.085,||u 1 -p 4 ||=1.955, ||u 1 -p 5 I | =2.273, and a comparison of the respective values indicates that p is 1 Distance u 1 Recently, therefore u 1 Is NN (u) 1 )=p 1 Let p be 1 Add the kernel set and similarly, the distance information between points is as follows:
TABLE 4
Figure BDA0003152836470000103
Yield NN (u) 2 )=p 2 ,NN(u 3 )=p 3 ,NN(u 4 )=p 1 ,NN(u 5 )=p 2 The points and their neighbors are shown in fig. 4 (a). The nearest neighbors to each point at this time were recorded as follows:
TABLE 5
Figure BDA0003152836470000104
So the kernel set C = { p = 1 ,p 2 ,p 3 }。
Step (3) because of p 1 =NN(u 1 ),p 1 =NN(u 4 ) So that U is p1 ={u 1 ,u 4 }, similarly, get U p1 ={u 2 , u 5 },U p3 ={u 3 At this time, the kernel concentrates on each point U p The information is as follows:
TABLE 6
Figure BDA0003152836470000111
Step (4-1), let
Figure BDA0003152836470000112
And (4-2) calculating cov (R) =0.
Step (4-3), cov (R ≧ p) 1 )-cov(R)=|U p1 |-0=2,cov(R∪p 2 )-cov(R)=|U p2 |-0=2,cov(R ∪p 3 )-cov(R)=|U p3 I-0 =1, when p 1 And p 2 The maximum coverage increase is achieved at the same time. Thus choosing p in order of subscript 1 Add result set when R = { p = 1 }。
Step (4-4), where the size of R is 1 and the user-specified result set size is 2, so steps (4-2) to (4-3) are repeated.
Step (4-2), calculating cov (R) = | U p1 |=2。
Step (4-3), cov (R ≧ p) 1 )-cov(R)=|U p1 ∪U p1 |-|U p1 |=2-2=0,cov(R∪p 2 )-cov(R)=|U p1 ∪U p2 |-|U p1 |=4-2=2,cov(R∪p 3 )-cov(R)=|U p1 ∪U p3 |-|U p1 I | =3-2=1, so p will be 2 Add result set when R = { p = 1 ,p 2 }。
And (4-4), when the size of R is 2 and is equal to the size of the result set specified by the user, the step (4-5) is carried out.
And (4-5) recording and returning the regressive rate minimization query result set R = { p) at the moment 1 ,p 2 }。
Step (5), inserting p in the database in a change way 6 (90, 105), the step (6-1) is executed.
Step (6-1) of adding p 6 Normalizing the attribute value to obtain p 6 (90/160, 105/150), i.e. p 6 (0.563,0.7)。 p 6 Distances from each point in N are as follows:
TABLE 7
Figure BDA0003152836470000113
Because | | | u 1 -NN(u 1 )||=||u 1 -p 1 ||=1.848<||u 1 -p 6 I | =1.979, so u 1 Without the addition of I, the reaction mixture was, similarly,
||u 2 -NN(u 2 )||=||u 2 -p 2 ||=1.420<||u 2 -p 6 ||=1.804,u 2 not adding into I;
||u 3 -NN(u 3 )||=||u 3 -p 3 ||=1.565>||u 3 -p 6 ||=1.524,u 3 adding into I;
||u 4 -NN(u 4 )||=||u 4 -p 1 ||=1.722>||u 4 -p 6 ||=1.685,u 4 adding the mixture into the solution I;
||u 5 -NN(u 5 )||=||u 5 -p 2 ||=1.464<||u 5 -p 6 ||=1.573,u 5 no addition was made to I. Thus I = { u = 3 ,u 4 And f, if not, performing kernel set maintenance, and executing the step (6-2).
Step (6-2) of adding p 6 Add kernel set and order U p6 =I={u 3 ,u 4 }; operate separately for each point in I, where for u 3 ,q=NN(u 3 )=p 3 Will u 3 Slave U p3 Is removed, at this time U p3 For an empty set, p is 3 Removing from the core set; for u 4 ,q=NN(u 4 )=p 1 Will u 4 From U p1 Is removed, at this time U p1 ={u 1 And not an empty set. At this point, the core concentrates the U of each point p The information is as follows:
TABLE 8
Figure BDA0003152836470000121
The relationship between the points and their neighbors is shown in FIG. 4 (b). The nearest neighbors of each point after updating at this time are as follows:
TABLE 9
Figure BDA0003152836470000122
Then, the process proceeds to step (4 a).
The execution process of step (4 a) is similar to that of step (4) above, and is not repeated, and the regret rate minimization query result set R = { p } at this time is returned 2 ,p 6 }。
(thus far, the first change processing was completed)
Step (5 a), inserting p with variation in database 7 (160, 30), and executing the step (6-1 a).
Step (6-1 a) of adding p 7 Normalizing the attribute value to obtain p 7 (160/160, 30/150), i.e. p 7 (1,0.2)。p 7 Distances from each point in N are as follows:
TABLE 10
Figure BDA0003152836470000131
Because | | | u 1 -NN(u 1 )||=||u 1 -p 1 ||=1.848>||u 1 -p 7 I | =1.428, so u 1 The addition of the compound to I, similarly,
||u 2 -NN(u 2 )||=||u 2 -p 2 ||=1.420<||u 2 -p 7 ||=2.429,u 2 not adding into I;
||u 3 -NN(u 3 )||=||u 3 -p 6 ||=1.524<||u 3 -p 7 ||=1.665,u 3 not adding into I;
||u 4 -NN(u 4 )||=||u 4 -p 6 ||=1.685>||u 4 -p 7 ||=1.425,u 4 adding into I;
||u 5 -NN(u 5 )||=||u 5 -p 2 ||=1.464<||u 5 -p 7 ||=2.036,u 5 no addition was made to I. Thus I = { u = 1 ,u 4 And (6) if the current state is not empty, core set maintenance is needed, and step (6-2 a) is executed.
Step (6-2 a) of adding p 7 Add core set and order Up 7 =I={u 1 ,u 4 }; operate on each point in I separately for u 1 ,q=NN(u 1 )=p 1 Will u 1 Slave U p1 Is removed, at this time U p1 For an empty set, p is 1 Removing from the core set; for u 4 ,q=NN(u 4 )=p 6 U is to be 4 Slave U p6 Is removed, at this time U p6 ={u 3 And not an empty set. At this point, the core concentrates the U of each point p The information is as follows:
TABLE 11
Figure BDA0003152836470000132
The relationship between the points and their neighbors is shown in FIG. 4 (c). The nearest neighbor of the updated point at this time is as follows:
TABLE 12
Figure BDA0003152836470000133
And then, the step (4 b) is carried out.
And (4 b) is executed (like the step (4) above), and the regrettability minimization query result set R = { p) at the moment is returned 2 , p 7 }。
(thus far, the second change processing is completed)
Step (5 b), deleting p with change in database 4 And (6 b) executing.
Step (6-1 b) because p 4 If not, step (5 c) is performed without performing kernel set maintenance.
(thus far, the third changing treatment was completed)
Step (5 c), deleting p in the database when the change occurs 2 And (6) executing the step (6-1 c).
Step (6-1 c), p 2 In the kernel set, kernel set maintenance is required, and step (6-2 c) is performed.
Step (6-2 c) of adding p 2 Removing, U from the kernel set p2 ={u 2 ,u 5 And for each point, searching the nearest neighbor again, wherein the distance information of the relevant point is as follows:
watch 13
Figure BDA0003152836470000141
Thus for u 2 Searching its nearest neighbor in the latest data set to obtain NN (u) 2 )=p 5 Since p is 5 Is not in core set, so p is 5 Adding into the core set, and adding u 2 Adding U p5 Performing the following steps; for u 5 Searching its nearest neighbor, NN (u), in the latest dataset 5 )=p 5 At this time p 5 Has been in the core set (for u) 2 Already added at the time of operation), u is added 5 Adding U p5 In (1). At this point, the core concentrates the U of each point p The information is as follows:
TABLE 14
Figure BDA0003152836470000142
The relationship between the points and their neighbors is shown in FIG. 4 (d). The nearest neighbors to each point after update at this time are as follows:
watch 15
Figure BDA0003152836470000143
And then, the step (4 d) is carried out.
Step (4 d) is executed (same as the previous step (4)), and the regret rate minimization query result set R = { p } at the moment is returned 5 , p 7 }。
(the fourth Change treatment thus far completed)
And (5 d), the database is not changed any more, and the method is ended.
In summary, the continuous regret rate minimization query method based on the core set of the present invention considers the influence of the insertion and deletion of tuples in the database on the regret rate minimization query result, constructs the core set based on the nearest neighbor search, and performs regret rate minimization query by using the core set and the maximum coverage method. On the basis of obtaining the initial core set and the initial regret minimization query result set, aiming at the condition that the insertion and deletion of the tuple in the database may change the core set and the regret minimization query result set, once the insertion or deletion of the tuple occurs in the database, whether the core set needs to be changed is judged, and the maintenance of the core set and the updating of the regret minimization query result set are triggered to meet the requirement of continuous regret minimization query. The invention utilizes the nearest neighbor relation in the core set to rapidly adjust the core set, thereby effectively reducing the maintenance time and improving the efficiency of continuous regret rate minimum query under the non-static database.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (7)

1. A continuous regret rate minimization query method based on a core set is characterized by comprising the following steps:
step 1, carrying out standardization processing on an original data set D with a dimension D, so that attribute values of all tuples in the original data set D are in a [0,1] interval;
step 2, constructing a core set C of original data;
step 3, respectively setting and calculating a set U of points which take p as nearest neighbor in N for all points p in the core set p Wherein N is a radius of
Figure FDA0003832693010000011
The d-dimensional space sphere is positioned on the surface in the non-negative image limit and randomly samples a set consisting of a plurality of points;
step 4, calculating an regret query result set R;
the specific process of the step 4 is as follows:
4-1, order
Figure FDA0003832693010000012
4-2, calculating the coverage value cov (R) of the result set as U of all points in the result set t The size of the union of (1), i.e. cov (R) = | < U | t∈R U p I, cov (R) denotes how many points in R are nearest neighbors to N;
4-3, traversing all the points p in the core set, and finding out the points p ', so that the increment of the coverage value of the result set after the points p' are added is larger than the increment of the coverage value of the result set after any other points are added, namely:
p′=argmax p∈C cov(R∪p)-cov(R),
adding point p' to the result set;
4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;
4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user;
step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise executing step 6;
step 6, maintaining a core set and an unfortunate rate minimization query result set;
the specific content of the step 6 is as follows:
6-1, respectively judging whether the core set needs maintenance according to two conditions of tuple insertion and tuple deletion, if so, turning to a step 6-2, and if not, turning to a step 5:
6-2, adjusting the core set U according to the new nearest neighbor relation p And then goes to step 4.
2. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m 1 ,m 2 ,…,m d Then, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value
Figure FDA0003832693010000021
Wherein s [ i ]]Representing the attribute values in the ith dimension of the tuple s; when the attribute values of all tuples are 0,1]And (4) interval.
3. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 2 is as follows:
2-1, order
Figure FDA0003832693010000022
2-2, in the sphere with the origin of coordinates as the center and the radius as the radius
Figure FDA0003832693010000023
The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first 1 ,u 2 ,…,u d Then order
Figure FDA0003832693010000024
Then tuple (u) 1 ,u 2 ,…,u d ) I.e. representing a radius of
Figure FDA0003832693010000025
The d-dimensional space sphere is positioned on the surface in the non-negative image limit to obtain a point by random sampling, the obtained point is added into a set N, the process is repeated for a plurality of times, and finally the radius of the obtained point is
Figure FDA0003832693010000026
The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;
2-3, respectively searching and recording the nearest neighbor NN (u) of all the points u in the N in the normalized data, namely the point with the Euclidean distance to the point u in the normalized initial data, namely the order
Figure FDA0003832693010000027
Adding it into the core set to obtain the final productCore set C = $ U u∈N NN (u) wherein s [ i ]]Representing the value of an attribute in the ith dimension of the tuple s.
4. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific process of the step 3 is as follows: for a certain point p, all points U in N are traversed, whether the point p is the nearest neighbor of the point U is judged, if yes, namely p = NN (U), U is added into U p In (1).
5. The method of claim 1, wherein the query is based on a continuous regret-minimization of kernel sets: in step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript sequence.
6. The method of claim 1, wherein the query is based on a continuous regret-minimization of kernel sets: in step 6, for the case of tuple insertion, the specific process is as follows:
a6-1, standardizing the tuple p, setting a set I and initializing the set I to be empty, adding u into the set I if the distance between u and the recorded nearest neighbor of u is greater than the distance between u and p for all points u in N, maintaining the core set if the I is not empty, turning to the step A6-2, maintaining the core set if the I is empty, and turning to the step 5;
a6-2, insert tuple p into the Kernel set, and let U p = I, for all points u therein, the following is done respectively: let q = NN (U) and let U be the next to U q In which U is removed and added p Determine U q If it is an empty set, and if so, removing q from the kernel set.
7. The method of claim 1, wherein the query is based on a continuous regret-minimization of kernel sets: in step 6, for the case of tuple deletion, the specific process is as follows:
b6-1, judging whether the tuple p is in a core set, if so, judging that the core set needs to be maintained, turning to the step B6-2, otherwise, judging that the maintenance is not needed, and turning to the step 5;
b6-2, removing p from the core set, U p The original nearest neighbors of all the points in the list are deleted, so that the following processing is respectively carried out on all the points u: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as U p In which U is removed and added q Judging whether q is already in the core set, if not, adding q into the core set.
CN202110770688.1A 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set Active CN113448994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110770688.1A CN113448994B (en) 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110770688.1A CN113448994B (en) 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set

Publications (2)

Publication Number Publication Date
CN113448994A CN113448994A (en) 2021-09-28
CN113448994B true CN113448994B (en) 2023-02-03

Family

ID=77815406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110770688.1A Active CN113448994B (en) 2021-07-07 2021-07-07 Continuous regrettage minimization query method based on core set

Country Status (1)

Country Link
CN (1) CN113448994B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563155A (en) * 2022-10-10 2023-01-03 深圳计算科学研究院 Incremental query method based on linear programming

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649489A (en) * 2016-09-28 2017-05-10 南京航空航天大学 Continuous skyline query processing mechanism in geographic text information data
CN108932251A (en) * 2017-05-25 2018-12-04 郑州大学 A kind of k- on the frequent updating data set based on sequence dominates search algorithm Skyline
US10200814B1 (en) * 2018-04-24 2019-02-05 The Florida International University Board Of Trustees Voronoi diagram-based algorithm for efficient progressive continuous k-nearest neighbor query for moving objects

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112691383A (en) * 2021-01-14 2021-04-23 上海交通大学 Texas poker AI training method based on virtual regret minimization algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649489A (en) * 2016-09-28 2017-05-10 南京航空航天大学 Continuous skyline query processing mechanism in geographic text information data
CN108932251A (en) * 2017-05-25 2018-12-04 郑州大学 A kind of k- on the frequent updating data set based on sequence dominates search algorithm Skyline
US10200814B1 (en) * 2018-04-24 2019-02-05 The Florida International University Board Of Trustees Voronoi diagram-based algorithm for efficient progressive continuous k-nearest neighbor query for moving objects

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Fully Dynamic Algorithm for k-Regret Minimizing Sets;Yanhao Wang等;《https://arxiv.org/pdf/2005.14493.pdf》;20201231;第1-15页 *
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative;Abolfazl Asudeh等;《SIGMOD "17: Proceedings of the 2017 ACM International Conference on Management of Data》;20170531;第821-834页 *
Faster Algorithms for k-Regret Minimizing Sets via Monotonicity and Sampling;Qi Dong等;《CIKM "19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management》;20191130;第2213-2216页 *
Regret-minimizing representative databases;Danupon Nanongkai等;《Proceedings of the VLDB Endowment》;20100930;第3卷;第1114-1124页 *

Also Published As

Publication number Publication date
CN113448994A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN108241745B (en) Sample set processing method and device and sample query method and device
US10102253B2 (en) Minimizing index maintenance costs for database storage regions using hybrid zone maps and indices
CN110134714B (en) Distributed computing framework cache index method suitable for big data iterative computation
US10754887B1 (en) Systems and methods for multimedia image clustering
CN107341178B (en) Data retrieval method based on self-adaptive binary quantization Hash coding
CN109829066B (en) Local sensitive Hash image indexing method based on hierarchical structure
US20180357281A1 (en) Class specific context aware query processing
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
Tang et al. Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce.
US10642918B2 (en) Efficient publish/subscribe systems
CN111552710A (en) Query optimization method for distributed database
CN110334290B (en) MF-Octree-based spatio-temporal data rapid retrieval method
CN110888880A (en) Proximity analysis method, device, equipment and medium based on spatial index
CN113448994B (en) Continuous regrettage minimization query method based on core set
Huang et al. A clustering based approach for skyline diversity
CN110069500B (en) Dynamic mixed indexing method for non-relational database
CN108549696B (en) Time series data similarity query method based on memory calculation
JP2010277329A (en) Neighborhood retrieval device
CN110209895B (en) Vector retrieval method, device and equipment
CN112162986B (en) Parallel top-k range skyline query method and system
Zhou et al. A novel locality-sensitive hashing algorithm for similarity searches on large-scale hyperspectral data
Lu et al. Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing
Lin Efficient and compact indexing structure for processing of spatial queries in line-based databases
US11822582B2 (en) Metadata clustering
CN116304253B (en) Data storage method, data retrieval method and method for identifying similar video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant