CN113448994B

CN113448994B - Continuous regrettage minimization query method based on core set

Info

Publication number: CN113448994B
Application number: CN202110770688.1A
Authority: CN
Inventors: 郑吉平; 马炜
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2023-02-03
Anticipated expiration: 2041-07-07
Also published as: CN113448994A

Abstract

The invention discloses a continuous regret rate minimization query method based on a core set, which comprises the steps of constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, monitoring whether the database generates tuple change or not, including the insertion of a new tuple and the deletion of an existing tuple, and maintaining and updating the core set and the result set for each tuple change, so as to achieve the aim of efficiently finishing continuous regret rate minimization query under a non-static database. According to the method, a core set is constructed on an initial data set by introducing a core set method and utilizing nearest neighbor search, an unfortunate rate minimization query result set is calculated based on the core set, and along with the insertion and deletion of tuples in a database, the nearest neighbor relation and the core set are updated and a latest result set is calculated based on the nearest neighbor relation and the core set, so that the aim of continuously returning a series of real-time unfortunate rate minimization query result sets to a user under a non-static database is fulfilled.

Description

Continuous regrettage minimization query method based on core set

Technical Field

The invention belongs to the technical field of databases, and particularly relates to a continuous regret rate minimization query method based on a core set.

Background

The extraction of several representative tuples from a database is an important function in many applications such as multi-criteria decision making, recommendation systems and web search. Unfortunately, minimization of queries allows users to be satisfied as well as possible when confronted with subsets as with the entire database by selecting subsets of a fixed size, which has been widely used in the related arts due to the advantages of controllable result size and no need for users to input complex information. However, as information technology develops, static databases are no longer practical, and many applications require the use of non-static databases in which data tuples are inserted and deleted. The problem of how to select the representative tuples is also faced in this type of application. For example, in a restaurant reservation system, the restaurant's business status, per-capita price, etc. may change over time, the restaurant's business and business closing may be regarded as the insertion and deletion of tuples in the database, while the change in per-capita price may be regarded as the modification of tuples, which may be regarded as the insertion of a deletion followed by an insertion. Obviously, at different time points, the representative restaurants returned for the user are definitely different, so that how to continuously acquire the regret rate minimized query result set from the dynamically-changed database to represent the real-time database becomes a problem to be solved urgently.

For continuous regrettable rate minimization query under a non-static database, the existing method [1] converts the query into a dynamic set coverage problem, firstly, the preferences of a plurality of users are obtained in a random sampling mode, if a certain tuple is a tuple with the highest score under a certain preference, the tuple covers the preferences or the preferences are covered by the tuple, a set system consisting of a base set and a subset set is constructed based on the preferences, wherein the base set is a randomly sampled preference set, the preference set covered by each tuple in the database forms one element in the subset set, and the original problem is converted into a set coverage problem, namely, a fixed number of elements are selected from the subset set, so that the union set of the elements is equal to the base set. According to the method, a certain number of iterative operations are carried out, one tuple is selected to be added into a result set every time, so that the total preference number covered by the result set after the tuple is added is larger than the total preference number covered by the result set after any other tuple is added, and the result set with a fixed size is finally obtained. And if the result set does not cover all the preferences obtained by sampling, randomly sampling the preferences with less number, repeating the process until a result set with fixed size and covering all the preferences is obtained, and returning the result to the user. After which the collection system and the result set are updated with the insertion or deletion of tuples in the database, which correspond to the insertion and deletion, respectively, of elements in the subset set of the collection system, and the result set is updated so that it is a set-covering solution of the new collection system. Therefore, the method can return a series of regrettably minimized query result sets along with the change of the tuples in the database. However, the method needs to update the aggregation system after each tuple change, which results in low efficiency of the method, and the method needs to store the entire aggregation system obtained after conversion for subsequent update, and the huge aggregation system occupies a large space resource.

The efficiency of the solution has a great influence on the continuous regret rate minimization query under the non-static database, and if the efficiency of the method is low, so that the previous change is not processed and completed when a new change occurs, congestion may occur and even the system is crashed, and high consumption of space resources also has a high requirement on a hardware system for operating the method. The invention combines continuous regret rate minimum query under the non-static database with the core set, omits partial update operation which does not affect the core set, efficiently solves the problem, only stores the information related to the core set, and occupies less space resources.

The documents mentioned above are derived from the following articles:

[1]Yanhao Wang,Yuchen Li,Raymond Chi-Wing Wong,Kian-Lee Tan.A Fully Dynamic Algorithm for k-Regret Minimizing Sets.In Proceedings of the 37th International Conference on Data Engineering(ICDE),pages:1631-1642,2021.

disclosure of Invention

The invention aims to provide a continuous regret rate minimization query method based on a core set, which constructs the core set on an initial data set by introducing a core set method and utilizing nearest neighbor search, calculates a regret rate minimization query result set based on the core set, updates nearest neighbor relations and the core set along with the insertion and deletion of tuples in a database, and calculates a latest result set based on the core set, thereby realizing the aim of continuously returning a series of real-time regret rate minimization query result sets for users under an unsteady database with high efficiency.

In order to achieve the above purpose, the solution of the invention is:

a continuous regret rate minimization query method based on a core set comprises the following steps:

step 1, carrying out standardization processing on an original data set D with a dimension D, so that attribute values of all tuples in the original data set D are in a [0,1] interval;

step 2, constructing a core set C of original data;

step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core set _p Wherein N is a radius of

The d-dimensional space sphere is positioned on the surface in the non-negative image limit and randomly samples a set consisting of a plurality of points;

step 4, calculating an unfortunate query result set R;

step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise, executing step 6;

and 6, maintaining the core set and the regrettable minimization query result set.

The specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m ₁ ,m ₂ ,…,m _d Then, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value

Wherein p [ i]Representing the attribute values in the ith dimension of the tuple p; when all the tuples have attribute values of 0,1]An interval.

The specific content of the step 2 is as follows:

2-1, order

2-2, in the sphere with the origin of coordinates as the center and the radius as the radius

The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first ₁ ,u ₂ ,…,u _d Then order

Then tuple (u) ₁ ,u ₂ ,…,u _d ) I.e. representing a radius of

The d-dimensional space sphere is positioned on the surface in the non-negative image limit to obtain a point by random sampling, the obtained point is added into a set N, the process is repeated for a plurality of times, and finally the radius of the obtained point is

The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;

2-3, respectively searching and recording the nearest neighbor NN (u) of all the points u in the N in the normalized data, namely the point with the Euclidean distance to the point u in the normalized initial data, namely the order

And adding the core set into the core set to finally obtain a core set C = { (U) } _u∈N NN(u)。

The specific process of the step 3 is as follows: for a certain point p, all points U in N are traversed, whether the point p is the nearest neighbor of the point U is judged, if yes, p = NN (U), U is added into U _p In (1).

The specific process of the step 4 is as follows:

4-1, order

4-2, calculating the coverage value cov (R) of the result set as U of all points in the result set _p The size of the union of (1), i.e. cov (R) = | < U | _p∈R U _p I, cov (R) indicates how many points in R are nearest neighbors to;

4-3, traversing all points p in the core set, and finding out a point p ', so that the increment of the coverage value of the result set after the point p' is added is larger than the increment of the coverage value of the result set after any other point is added, namely:

p′＝argmax _p∈C cov(R∪p)-cov(R)，

adding point p' to the result set;

4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;

and 4-5, recording the regret rate minimization query result set obtained in the step 4-4, and returning the regret rate minimization query result set to the user.

In step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript order.

The specific content of the step 6 is:

6-1, respectively judging whether the core set needs maintenance according to two conditions of tuple insertion and tuple deletion, if so, turning to a step 6-2, and if not, turning to a step 5:

6-2, adjusting the core set and the U according to the new nearest neighbor relation _p And then goes to step 4.

In the step 6, for the tuple insertion, the specific process is as follows:

a6-1, standardizing the tuple p, setting a set I and initializing the set I to be empty, adding u into the set I if the distance between u and the recorded nearest neighbor of u is greater than the distance between u and p for all points u in N, maintaining the core set if the I is not empty, turning to the step A6-2, maintaining the core set if the I is empty, and turning to the step 5;

a6-2, inserting tuple p into core setIn and let U _p = I, for all points u therein, the following is done respectively: let q = NN (U) and let U be the next nearest neighbor of q _q In which U is removed and added _p Determine U _q If it is an empty set, and if so, removing q from the kernel set.

In the step 6, for the case of tuple deletion, the specific process is as follows:

b6-1, judging whether the tuple p is in the core set, if so, switching to the step B6-2 if the core set needs to be maintained, otherwise, switching to the step 5 if the maintenance is not needed;

b6-2, removing p from the core set, U _p The original nearest neighbors of all the points in the list are deleted, so that the following processing is respectively carried out on all the points u: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as U _p In which U is removed and added _q Judging whether q is already in the core set, if not, adding q into the core set.

After adopting the scheme, compared with the prior art, the invention has the following beneficial effects:

(1) The method for further updating the regret rate minimization query result set based on the maintenance of the core set omits part of updating operation which does not affect the core set, greatly reduces the average processing time consumption of each data change and can more efficiently complete continuous regret rate minimization query compared with the prior art that the set system needs to be updated for each change;

(2) In the prior art, a huge integrated system needs to be stored in the processing process, and the integrated system occupies a larger storage space, and relatively, the integrated system has no requirement on the larger storage space.

Drawings

FIG. 1 is an overall flow diagram of the present invention;

FIG. 2 is a schematic diagram of a core set construction process for a 2-dimensional case;

points on the 1/4 circle in the figure represent points u obtained by random sampling; points in the square frame represent each tuple standardized in the database; the connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point u is the end point NN (u); the dots with dotted circles represent the dots selected into the core set;

FIG. 3 is a flow diagram of computing a result set;

FIG. 4 is a diagram illustrating nearest neighbor relationships between phase points in an embodiment of the invention;

wherein u is _i Represents 5 randomly sampled points in the embodiment, which are located around the origin and have a radius of

On the 1/4 circle in the non-negative quadrant; p is a radical of _i Represent the tuples normalized in the database because their coordinates are all at 0,1]In the interval, they are located in the square frame formed by the coordinate axis and the dotted line; the solid line connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point is the end point; the dotted line connecting line represents the deletion of the nearest neighbor relation caused by the tuple change; the dotted line connecting line represents the increase of the nearest neighbor relation caused by the current tuple change;

in fig. 4, (a) represents the nearest neighbor relation on the initial data set, and (b) represents the insertion p ₆ Nearest neighbor relation on the latter data set, (c) denotes insertion p ₇ Nearest neighbor relation on the latter data set, (d) represents deletion p ₄ And p ₂ Nearest neighbor relationships on the latter data set.

Detailed Description

The invention provides a continuous regret rate minimization query method based on a core set, which comprises the steps of firstly constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, then monitoring whether the database generates tuple changes, including the insertion of new tuples and the deletion of existing tuples, and maintaining and updating the core set and the result set for each tuple change, thereby achieving the purpose of efficiently finishing continuous regret rate minimization query under a non-static database.

As shown in fig. 1, the present invention comprises the steps of:

step 1, carrying out standardization processing on an original data set D with a dimension D so as to enable attribute values of all tuples in the original data set to be in a [0,1] interval;

the specific operation of the step 1 is as follows: firstly, searching and recording the data set D and the maximum attribute value in each dimension in all the tuples of the variation sequence, and respectively marking as m ₁ ,m ₂ ,…,m _d Then, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value

Wherein p [ i]Representing the attribute value of the ith dimension of the tuple p, and after conversion, the attribute values of all tuples are [0,1]]An interval;

step 2, constructing a core set C of the original data, and as shown in fig. 2, mainly including the following steps:

2-1, initializing the kernel set to an empty set, i.e. ordering

Then tuple (u) ₁ ,u ₂ ,…,u _d ) I.e. representing a radius of

Is located on a surface within a non-negative image limit, randomly sampling the obtained point, adding the obtained point to a set N, and repeating the stepThe process is repeated for several times to finally obtain the product with the radius of

2-3, respectively searching and recording the nearest neighbor NN (u) of all points u in the normalized data, namely, the point with the Euclidean distance to the point u in the normalized initial data, namely, enabling all the points u in the N to be respectively searched and recorded

And adding the core set into the core set to finally obtain the core set C = $ U _u∈N NN(u)。

Step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core set _p = { u | u ∈ N ^ p = NN (u) }; the specific process is as follows: for a certain point p, all points U in N are traversed, whether the point p is the nearest neighbor of the point U is judged according to the nearest neighbor relation recorded in the step 2-3, if yes, namely p = NN (U), U is added into U _p In (1). Each point in the core set corresponds to a U _p Set, U _p Representing the set of all points with the point p as the nearest neighbor in N;

step 4, calculating an regrettable query result set R, and referring to fig. 3, specifically including the following processes:

4-1, initializing the regret query result set as an empty set, namely, commanding

4-2, calculating the coverage value cov (R) of the result set as U of all points in the result set _p I.e. cov (R) = | u _p∈R U _p I, cov (R) indicates how many points in R are nearest neighbors to;

p′＝argmax _p∈C cov(R∪p)-cov(R)，

adding the point p' into the result set, wherein if a plurality of points obtain the maximum coverage value increment at the same time, the points are selected according to the subscript sequence, but when the number of random sampling points is large, namely the set N is large, the situation hardly occurs;

4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user;

step 6, maintaining the core set and the regrettable minimization query result set, which specifically comprises:

6-1, judging whether the core set needs maintenance according to the condition of tuple insertion or tuple deletion, specifically analyzing according to the following two conditions, if the core set needs maintenance, executing the step 6-2, and if the core set does not need maintenance, turning to the step 5:

a) Tuple p insertion: assigning the attribute value of each dimension of p as the original attribute value to be divided by the maximum attribute value of the corresponding dimension recorded in the step 1, namely normalizing p, then setting a set I and initializing the set I to be null, and adding u to the set I if the distance between u and the recorded nearest neighbor of the u is greater than the distance between u and p for all points u in N, namely | | | u-NN (u) | > | | u-p |, so that I represents a set of points influenced by the nearest neighbor of the point due to insertion of a tuple p, namely the point in I takes p as the new nearest neighbor of the point, if I is not null, the core set needs to be maintained, and if I is null, the maintenance is not needed;

b) Tuple p deletion: if p is in the kernel set, then the kernel set needs maintenance, otherwise it means that maintenance is not needed.

6-2, adjusting the core set and the U according to the new nearest neighbor relation _p Then go to step 4; wherein, the adjusted content performs different operations according to two situations of tuple insertion and deletion:

a) Tuple p insertion: insert point p into the kernel set and let U _p = I, where I is the set I obtained in step 6-1, the new nearest neighbor of all points in I has become p, and the following is performed for all points u: let q = NN (U) and let U be the next to U _q In which U is removed and added _p Judgment of U _q If the core set is an empty set, if so, q is removed from the core set;

b) Tuple p deletion: remove point p from the core set, U _p The original nearest neighbors of all points in the list have been deleted, so that the following processing is respectively carried out for all points u in the list: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as U _p In which U is removed and added _q Judging whether q is already in the core set, if not, adding q into the core set.

The technical solution of the present invention will be described in detail by a specific example.

The present embodiment assumes that the data dimension is equal to 2, i.e. d =2, and the result set size required by the user is 2. First assume that the initial database contains the tuple p ₁ ,p ₂ ,p ₃ ,p ₄ ,p ₅ The specific attribute value information is as follows:

TABLE 1

Suppose that the change in the database is four times in total and in turn, insert p as follows ₆ (90, 105) inserting p ₇ (160, 30), deleting p ₄ Deleting p ₂ 。

Step (1), searching the maximum value of the attribute value on each dimension in all the tuples of the initial database and the variation sequence to obtain m ₁ ＝p ₇ [1]＝160，m ₂ ＝p ₂ [2]=150, assign attribute values on each dimension of all tuples to original attribute values divided by corresponding dimension of recordThe attribute value with the greatest degree, i.e. p ₁ [1]＝100/160＝0.625， p ₂ [1]=45/150=0.3, etc., which will not be described in detail. The normalized attribute value information for each tuple is then as follows:

TABLE 2

Step (2-1), let

Step (2-2), d =2 random numbers are obtained, and it is assumed that u is obtained ₁ ＝71，u ₂ =29, then order

This is repeated several times (in this embodiment, it is assumed that the number is 5), and the finally obtained related information of each point in the set N is assumed as the following table:

TABLE 3

Step (2-3), | | u ₁ -p ₁ ||＝1.814，||u ₁ -p ₂ ||＝2.498，||u ₁ -p ₃ ||＝2.085，||u ₁ -p ₄ ||＝1.955， ||u ₁ -p ₅ I | =2.273, and a comparison of the respective values indicates that p is ₁ Distance u ₁ Recently, therefore u ₁ Is NN (u) ₁ )＝p ₁ Let p be ₁ Add the kernel set and similarly, the distance information between points is as follows:

TABLE 4

Yield NN (u) ₂ )＝p ₂ ，NN(u ₃ )＝p ₃ ，NN(u ₄ )＝p ₁ ，NN(u ₅ )＝p ₂ The points and their neighbors are shown in fig. 4 (a). The nearest neighbors to each point at this time were recorded as follows:

TABLE 5

So the kernel set C = { p = ₁ ,p ₂ ,p ₃ }。

Step (3) because of p ₁ ＝NN(u ₁ )，p ₁ ＝NN(u ₄ ) So that U is _p1 ＝{u ₁ ,u ₄ }, similarly, get U _p1 ＝{u ₂ , u ₅ }，U _p3 ＝{u ₃ At this time, the kernel concentrates on each point U _p The information is as follows:

TABLE 6

Step (4-1), let

And (4-2) calculating cov (R) =0.

Step (4-3), cov (R ≧ p) ₁ )-cov(R)＝|U _p1 |-0＝2，cov(R∪p ₂ )-cov(R)＝|U _p2 |-0＝2，cov(R ∪p ₃ )-cov(R)＝|U _p3 I-0 =1, when p ₁ And p ₂ The maximum coverage increase is achieved at the same time. Thus choosing p in order of subscript ₁ Add result set when R = { p = ₁ }。

Step (4-4), where the size of R is 1 and the user-specified result set size is 2, so steps (4-2) to (4-3) are repeated.

Step (4-2), calculating cov (R) = | U _p1 |＝2。

Step (4-3), cov (R ≧ p) ₁ )-cov(R)＝|U _p1 ∪U _p1 |-|U _p1 |＝2-2＝0，cov(R∪p ₂ )-cov(R)＝|U _p1 ∪U _p2 |-|U _p1 |＝4-2＝2，cov(R∪p ₃ )-cov(R)＝|U _p1 ∪U _p3 |-|U _p1 I | =3-2=1, so p will be ₂ Add result set when R = { p = ₁ ,p ₂ }。

And (4-4), when the size of R is 2 and is equal to the size of the result set specified by the user, the step (4-5) is carried out.

And (4-5) recording and returning the regressive rate minimization query result set R = { p) at the moment ₁ ,p ₂ }。

Step (5), inserting p in the database in a change way ₆ (90, 105), the step (6-1) is executed.

Step (6-1) of adding p ₆ Normalizing the attribute value to obtain p ₆ (90/160, 105/150), i.e. p ₆ (0.563,0.7)。 p ₆ Distances from each point in N are as follows:

TABLE 7

Because | | | u ₁ -NN(u ₁ )||＝||u ₁ -p ₁ ||＝1.848<||u ₁ -p ₆ I | =1.979, so u ₁ Without the addition of I, the reaction mixture was, similarly,

||u ₂ -NN(u ₂ )||＝||u ₂ -p ₂ ||＝1.420<||u ₂ -p ₆ ||＝1.804，u ₂ not adding into I;

||u ₃ -NN(u ₃ )||＝||u ₃ -p ₃ ||＝1.565>||u ₃ -p ₆ ||＝1.524，u ₃ adding into I;

||u ₄ -NN(u ₄ )||＝||u ₄ -p ₁ ||＝1.722>||u ₄ -p ₆ ||＝1.685，u ₄ adding the mixture into the solution I;

||u ₅ -NN(u ₅ )||＝||u ₅ -p ₂ ||＝1.464<||u ₅ -p ₆ ||＝1.573，u ₅ no addition was made to I. Thus I = { u = ₃ ,u ₄ And f, if not, performing kernel set maintenance, and executing the step (6-2).

Step (6-2) of adding p ₆ Add kernel set and order U _p6 ＝I＝{u ₃ ,u ₄ }; operate separately for each point in I, where for u ₃ ，q＝NN(u ₃ )＝p ₃ Will u ₃ Slave U _p3 Is removed, at this time U _p3 For an empty set, p is ₃ Removing from the core set; for u ₄ ，q＝NN(u ₄ )＝p ₁ Will u ₄ From U _p1 Is removed, at this time U _p1 ＝{u ₁ And not an empty set. At this point, the core concentrates the U of each point _p The information is as follows:

TABLE 8

The relationship between the points and their neighbors is shown in FIG. 4 (b). The nearest neighbors of each point after updating at this time are as follows:

TABLE 9

Then, the process proceeds to step (4 a).

The execution process of step (4 a) is similar to that of step (4) above, and is not repeated, and the regret rate minimization query result set R = { p } at this time is returned ₂ ,p ₆ }。

(thus far, the first change processing was completed)

Step (5 a), inserting p with variation in database ₇ (160, 30), and executing the step (6-1 a).

Step (6-1 a) of adding p ₇ Normalizing the attribute value to obtain p ₇ (160/160, 30/150), i.e. p ₇ (1,0.2)。p ₇ Distances from each point in N are as follows:

TABLE 10

Because | | | u ₁ -NN(u ₁ )||＝||u ₁ -p ₁ ||＝1.848>||u ₁ -p ₇ I | =1.428, so u ₁ The addition of the compound to I, similarly,

||u ₂ -NN(u ₂ )||＝||u ₂ -p ₂ ||＝1.420<||u ₂ -p ₇ ||＝2.429，u ₂ not adding into I;

||u ₃ -NN(u ₃ )||＝||u ₃ -p ₆ ||＝1.524<||u ₃ -p ₇ ||＝1.665，u ₃ not adding into I;

||u ₄ -NN(u ₄ )||＝||u ₄ -p ₆ ||＝1.685>||u ₄ -p ₇ ||＝1.425，u ₄ adding into I;

||u ₅ -NN(u ₅ )||＝||u ₅ -p ₂ ||＝1.464<||u ₅ -p ₇ ||＝2.036，u ₅ no addition was made to I. Thus I = { u = ₁ ,u ₄ And (6) if the current state is not empty, core set maintenance is needed, and step (6-2 a) is executed.

Step (6-2 a) of adding p ₇ Add core set and order Up ₇ ＝I＝{u ₁ ,u ₄ }; operate on each point in I separately for u ₁ ，q＝NN(u ₁ )＝p ₁ Will u ₁ Slave U _p1 Is removed, at this time U _p1 For an empty set, p is ₁ Removing from the core set; for u ₄ ，q＝NN(u ₄ )＝p ₆ U is to be ₄ Slave U _p6 Is removed, at this time U _p6 ＝{u ₃ And not an empty set. At this point, the core concentrates the U of each point _p The information is as follows:

TABLE 11

The relationship between the points and their neighbors is shown in FIG. 4 (c). The nearest neighbor of the updated point at this time is as follows:

TABLE 12

And then, the step (4 b) is carried out.

And (4 b) is executed (like the step (4) above), and the regrettability minimization query result set R = { p) at the moment is returned ₂ , p ₇ }。

(thus far, the second change processing is completed)

Step (5 b), deleting p with change in database ₄ And (6 b) executing.

Step (6-1 b) because p ₄ If not, step (5 c) is performed without performing kernel set maintenance.

(thus far, the third changing treatment was completed)

Step (5 c), deleting p in the database when the change occurs ₂ And (6) executing the step (6-1 c).

Step (6-1 c), p ₂ In the kernel set, kernel set maintenance is required, and step (6-2 c) is performed.

Step (6-2 c) of adding p ₂ Removing, U from the kernel set _p2 ＝{u ₂ ,u ₅ And for each point, searching the nearest neighbor again, wherein the distance information of the relevant point is as follows:

watch 13

Thus for u ₂ Searching its nearest neighbor in the latest data set to obtain NN (u) ₂ )＝p ₅ Since p is ₅ Is not in core set, so p is ₅ Adding into the core set, and adding u ₂ Adding U _p5 Performing the following steps; for u ₅ Searching its nearest neighbor, NN (u), in the latest dataset ₅ )＝p ₅ At this time p ₅ Has been in the core set (for u) ₂ Already added at the time of operation), u is added ₅ Adding U _p5 In (1). At this point, the core concentrates the U of each point _p The information is as follows:

TABLE 14

The relationship between the points and their neighbors is shown in FIG. 4 (d). The nearest neighbors to each point after update at this time are as follows:

watch 15

And then, the step (4 d) is carried out.

Step (4 d) is executed (same as the previous step (4)), and the regret rate minimization query result set R = { p } at the moment is returned ₅ , p ₇ }。

(the fourth Change treatment thus far completed)

And (5 d), the database is not changed any more, and the method is ended.

In summary, the continuous regret rate minimization query method based on the core set of the present invention considers the influence of the insertion and deletion of tuples in the database on the regret rate minimization query result, constructs the core set based on the nearest neighbor search, and performs regret rate minimization query by using the core set and the maximum coverage method. On the basis of obtaining the initial core set and the initial regret minimization query result set, aiming at the condition that the insertion and deletion of the tuple in the database may change the core set and the regret minimization query result set, once the insertion or deletion of the tuple occurs in the database, whether the core set needs to be changed is judged, and the maintenance of the core set and the updating of the regret minimization query result set are triggered to meet the requirement of continuous regret minimization query. The invention utilizes the nearest neighbor relation in the core set to rapidly adjust the core set, thereby effectively reducing the maintenance time and improving the efficiency of continuous regret rate minimum query under the non-static database.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1. A continuous regret rate minimization query method based on a core set is characterized by comprising the following steps:

step 2, constructing a core set C of original data;

step 3, respectively setting and calculating a set U of points which take p as nearest neighbor in N for all points p in the core set _p Wherein N is a radius of

step 4, calculating an regret query result set R;

the specific process of the step 4 is as follows:

4-1, order

4-2, calculating the coverage value cov (R) of the result set as U of all points in the result set _t The size of the union of (1), i.e. cov (R) = | < U | _t∈R U _p I, cov (R) denotes how many points in R are nearest neighbors to N;

4-3, traversing all the points p in the core set, and finding out the points p ', so that the increment of the coverage value of the result set after the points p' are added is larger than the increment of the coverage value of the result set after any other points are added, namely:

p′＝argmax _p∈C cov(R∪p)-cov(R)，

adding point p' to the result set;

step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise executing step 6;

step 6, maintaining a core set and an unfortunate rate minimization query result set;

the specific content of the step 6 is as follows:

6-2, adjusting the core set U according to the new nearest neighbor relation _p And then goes to step 4.

2. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m ₁ ,m ₂ ,…,m _d Then, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value

Wherein s [ i ]]Representing the attribute values in the ith dimension of the tuple s; when the attribute values of all tuples are 0,1]And (4) interval.

3. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 2 is as follows:

2-1, order

Then tuple (u) ₁ ,u ₂ ,…,u _d ) I.e. representing a radius of

Adding it into the core set to obtain the final productCore set C = $ U _u∈N NN (u) wherein s [ i ]]Representing the value of an attribute in the ith dimension of the tuple s.

4. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific process of the step 3 is as follows: for a certain point p, all points U in N are traversed, whether the point p is the nearest neighbor of the point U is judged, if yes, namely p = NN (U), U is added into U _p In (1).

5. The method of claim 1, wherein the query is based on a continuous regret-minimization of kernel sets: in step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript sequence.

6. The method of claim 1, wherein the query is based on a continuous regret-minimization of kernel sets: in step 6, for the case of tuple insertion, the specific process is as follows:

a6-2, insert tuple p into the Kernel set, and let U _p = I, for all points u therein, the following is done respectively: let q = NN (U) and let U be the next to U _q In which U is removed and added _p Determine U _q If it is an empty set, and if so, removing q from the kernel set.

7. The method of claim 1, wherein the query is based on a continuous regret-minimization of kernel sets: in step 6, for the case of tuple deletion, the specific process is as follows:

b6-1, judging whether the tuple p is in a core set, if so, judging that the core set needs to be maintained, turning to the step B6-2, otherwise, judging that the maintenance is not needed, and turning to the step 5;