CN113448994A

CN113448994A - Continuous regrettage minimization query method based on core set

Info

Publication number: CN113448994A
Application number: CN202110770688.1A
Authority: CN
Inventors: 郑吉平; 马炜
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-09-28
Anticipated expiration: 2041-07-07
Also published as: CN113448994B

Abstract

The invention discloses a continuous regret rate minimization query method based on a core set, which comprises the steps of firstly constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, then monitoring whether the database generates tuple changes, including the insertion of new tuples and the deletion of existing tuples, and maintaining and updating the core set and the result set for each tuple change, thereby achieving the purpose of efficiently finishing continuous regret rate minimization query under a non-static database. According to the method, a core set is constructed on an initial data set by introducing a core set method and utilizing nearest neighbor search, an unfortunate rate minimization query result set is calculated based on the core set, and along with the insertion and deletion of tuples in a database, the nearest neighbor relation and the core set are updated and a latest result set is calculated based on the nearest neighbor relation and the core set, so that the aim of continuously returning a series of real-time unfortunate rate minimization query result sets to a user under a non-static database is fulfilled.

Description

Continuous regrettage minimization query method based on core set

Technical Field

The invention belongs to the technical field of databases, and particularly relates to a continuous regret rate minimization query method based on a core set.

Background

The extraction of several representative tuples from a database is an important function in many applications such as multi-criteria decision making, recommendation systems and web search. Unfortunately, minimization of queries allows users to be satisfied as well as possible when faced with subsets as well as with the entire database by selecting subsets of a fixed size, which has been widely used in the related arts due to the advantages of controllable result size and no need for users to input complex information. However, as information technology develops, static databases are no longer practical, and many applications require the use of non-static databases in which data tuples are inserted and deleted. The problem of how to select the representative tuples is also faced in this type of application. For example, in a restaurant reservation system, the restaurant's business status, per-capita price, etc. may change over time, the restaurant's business and business closing may be regarded as the insertion and deletion of tuples in the database, while the change in per-capita price may be regarded as the modification of tuples, which may be regarded as the insertion of a deletion followed by an insertion. Obviously, at different time points, the representative restaurants returned for the user are certainly different, so how to continuously obtain the regressive minimized query result set from the dynamically-changing database to represent the real-time database becomes a problem to be solved urgently.

For continuous regrettable rate minimization query under a non-static database, the existing method [1] converts the query into a dynamic set coverage problem, firstly, the preferences of a plurality of users are obtained in a random sampling mode, if a certain tuple is a tuple with the highest score under a certain preference, the tuple covers the preferences or the preferences are covered by the tuple, a set system consisting of a base set and a subset set is constructed based on the preferences, wherein the base set is a randomly sampled preference set, the preference set covered by each tuple in the database forms one element in the subset set, and the original problem is converted into a set coverage problem, namely, a fixed number of elements are selected from the subset set, so that the union set of the elements is equal to the base set. According to the method, a certain number of iterative operations are carried out, one tuple is selected to be added into a result set every time, so that the total preference number covered by the result set after the tuple is added is larger than the total preference number covered by the result set after any other tuple is added, and the result set with a fixed size is finally obtained. And if the result set does not cover all the preferences obtained by sampling, randomly sampling the preferences with less number, repeating the process until a result set with fixed size and covering all the preferences is obtained, and returning the result to the user. After which the collection system and the result set are updated with the insertion or deletion of tuples in the database, which correspond to the insertion and deletion, respectively, of elements in the subset set of the collection system, and the result set is updated so that it is a set-covering solution of the new collection system. Therefore, the method can return a series of regrettably minimized query result sets along with the change of the tuples in the database. However, the method needs to update the aggregation system after each tuple change, which results in low efficiency of the method, and the method needs to store the entire aggregation system obtained after conversion for subsequent update, and the huge aggregation system occupies a large space resource.

The efficiency of the solution method has a great influence on the continuous regret rate minimum query under the non-static database, if the efficiency of the method is low, so that the previous change is not processed and completed when a new change occurs, congestion may occur and even the system is crashed, and high consumption of space resources has higher requirements on a hardware system for operating the method. The invention combines the continuous regret rate minimum query under the non-static database with the core set, omits partial update operation which does not influence the core set, efficiently solves the problem, only stores the information related to the core set, and occupies less space resources.

The documents mentioned above originate from the following articles:

[1]Yanhao Wang,Yuchen Li,Raymond Chi-Wing Wong,Kian-Lee Tan.A Fully Dynamic Algorithm for k-Regret Minimizing Sets.In Proceedings of the 37th International Conference on Data Engineering(ICDE),pages:1631-1642,2021.

disclosure of Invention

The invention aims to provide a continuous regret rate minimization query method based on a core set, which constructs the core set on an initial data set by introducing a core set method and utilizing nearest neighbor search, calculates a regret rate minimization query result set based on the core set, updates nearest neighbor relations and the core set along with the insertion and deletion of tuples in a database, and calculates a latest result set based on the core set, thereby realizing the aim of continuously returning a series of real-time regret rate minimization query result sets for users under an unsteady database with high efficiency.

In order to achieve the above purpose, the solution of the invention is:

a continuous regret rate minimization query method based on a core set comprises the following steps:

step 1, carrying out standardization processing on an original data set D with dimension D to enable attribute values of all tuples in the original data set D to be in a [0,1] interval;

step 2, constructing a core set C of original data;

step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core set_pWherein N is a radius of

The d-dimensional space sphere is positioned on the surface in the non-negative image limit and randomly samples a set consisting of a plurality of points;

step 4, calculating an unfortunate query result set R;

step 5, waiting for the change of the tuple in the database and preparing for corresponding processing, if the insertion or deletion of a tuple does not occur in the database, finishing the query process, otherwise, executing step 6;

and 6, maintaining the core set and the regrettable minimization query result set.

The specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m₁,m₂,…,m_dThen, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value

Wherein p [ i]Representing the attribute value on the ith dimension of the tuple p; when the attribute values of all tuples are 0,1]An interval.

The specific content of the step 2 is as follows:

2-1, order

2-2, in the sphere with the origin of coordinates as the center and the radius as the radius

The surface of the d-dimensional space sphere in the non-negative quadrant is sampled with a plurality of points u, and the set formed by the points is marked as N; specifically, d non-negative random numbers, each of which is denoted as u, are obtained first₁,u₂,…,u_dThen order

Then tuple (u)₁,u₂,…,u_d) I.e. representing a radius of

The d-dimensional space sphere is positioned on the surface in the non-negative image limit to obtain a point by random sampling, the obtained point is added into a set N, the process is repeated for a plurality of times, and finally the radius of the obtained point is

The d-dimensional space sphere is positioned on the surface in the non-negative image limit, and a set N consisting of a plurality of points is randomly sampled;

2-3, respectively searching and recording the nearest neighbor NN (u) of all the points u in the N in the normalized data, namely the point which is closest to the Euclidean distance of the point u in the normalized initial data, namely the order

And adding the core set into the core set to obtain the core set C ═ U_u∈NNN(u)。

The specific process of the step 3 is as follows: for a certain point p, traversing all points U in N, judging whether the point p is the nearest neighbor of the point U, if so, adding U into U, namely p equals NN (U)_pIn (1).

The specific process of the step 4 is as follows:

4-1, order

4-2, calculating the coverage value cov (R) of the result set as U of all points in the result set_pThe size of the union of (a), i.e. cov (r) ═ | < u |_p∈RU_pI, |, cov (R) indicates how many points in R are nearest neighbors to N;

4-3, traversing all the points p in the core set, and finding out the points p ', so that the increment of the coverage value of the result set after the points p' are added is larger than the increment of the coverage value of the result set after any other points are added, namely:

p′＝argmax_p∈Ccov(R∪p)-cov(R)，

adding point p' to the result set;

4-4, repeating the steps 4-2 to 4-4 until the size of the result set is equal to the size specified by the user, and then turning to the step 4-5;

and 4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user.

In step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript order.

The specific content of the step 6 is as follows:

6-1, respectively judging whether the core set needs maintenance according to two conditions of tuple insertion and tuple deletion, if so, turning to a step 6-2, and if not, turning to a step 5:

6-2, adjusting the core set and the U according to the new nearest neighbor relation_pAnd then goes to step 4.

In the step 6, for the tuple insertion, the specific process is as follows:

a6-1, standardizing the tuple p, setting a set I and initializing the set I to be empty, adding u into the set I if the distance between u and the recorded nearest neighbor of u is greater than the distance between u and p for all points u in N, if I is not empty, maintaining the core set, turning to step A6-2, if I is empty, maintaining is not needed, and turning to step 5;

a6-2, insert tuple p into the core set and let U_pFor all points u therein, the following treatments are respectively made: let q be NN (U) and let U be the next nearest neighbor of q_qIn which U is removed and added_pJudgment of U_qIf it is an empty set, and if so, removing q from the kernel set.

In the step 6, for the tuple deletion, the specific process is as follows:

b6-1, judging whether the tuple p is in the core set, if so, the core set needs to be maintained, turning to the step B6-2, otherwise, the tuple indicates that the maintenance is not needed, and turning to the step 5;

b6-2, removing p from the Kernel set, U_pThe original nearest neighbors of all points in the list have been deleted, so that the following processing is respectively carried out for all points u in the list: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as U_pIn which U is removed and added_qJudging whether q is already in the core set, if not, adding q into the core set.

After adopting the scheme, compared with the prior art, the invention has the following beneficial effects:

(1) the method for further updating the regret rate minimization query result set based on the maintenance of the core set omits part of updating operation which does not affect the core set, greatly reduces the average processing time consumption of each data change and can more efficiently complete continuous regret rate minimization query compared with the prior art that the set system needs to be updated for each change;

(2) in the prior art, a huge collection system needs to be stored in the processing process, and a large storage space is occupied, and relatively, the method has no requirement on the large storage space.

Drawings

FIG. 1 is an overall flow diagram of the present invention;

FIG. 2 is a schematic diagram of a core set construction process for a 2-dimensional case;

the dots on the 1/4 circle in the figure represent the randomly sampled individual points u; points in the square frame represent each tuple standardized in the database; the connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point u is the end point NN (u); the dots with dotted circles represent the dots selected into the core set;

FIG. 3 is a flow diagram of computing a result set;

FIG. 4 is a diagram illustrating nearest neighbor relationships between phase points in an embodiment of the invention;

wherein u is_iRepresents 5 randomly sampled points in the embodiment, which are located at the center of the circle with the origin and the radius

1/4 circles in the non-negative quadrant; p is a radical of_iRepresent the tuples normalized in the database because their coordinates are all at 0,1]In the interval, they are located in the square frame formed by the coordinate axis and the dotted line; the solid line connecting line represents the nearest neighbor relation between two points, wherein the nearest neighbor of the starting point is the end point; the dotted line connecting line represents the deletion of the nearest neighbor relation caused by the tuple change; the dotted line connecting line represents the increase of the nearest neighbor relation caused by the current tuple change;

in fig. 4, (a) represents the nearest neighbor relation on the initial data set, and (b) represents the insertion p₆Nearest neighbor relation on the latter data set, (c) denotes insertion p₇Nearest neighbor relation on the latter data set, (d) represents deletion p₄And p₂Nearest neighbor relationships on the latter data set.

Detailed Description

The invention provides a continuous regret rate minimization query method based on a core set, which comprises the steps of firstly constructing an initial core set based on an initial database, further calculating an initial regret rate minimization query result set, then monitoring whether the database generates tuple changes, including the insertion of new tuples and the deletion of existing tuples, and maintaining and updating the core set and the result set for each tuple change, thereby achieving the purpose of efficiently finishing continuous regret rate minimization query under a non-static database.

As shown in fig. 1, the present invention comprises the steps of:

step 1, carrying out standardization processing on an original data set D with dimension D so that attribute values of all tuples in the original data set are within a [0,1] interval;

the specific operation of the step 1 is as follows: firstly, searching and recording the maximum attribute value in each dimension in all the tuples of the data set D and the variation sequence, and respectively recording the maximum attribute values as m₁,m₂,…,m_dThen, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value

Wherein p [ i]Representing the attribute value of the ith dimension of the tuple p, and after conversion, the attribute values of all tuples are [0,1]]An interval;

step 2, constructing a core set C of the original data, and as shown in fig. 2, mainly including the following steps:

2-1, initializing the kernel set to an empty set, i.e. ordering

Then tuple (u)₁,u₂,…,u_d) I.e. representing a radius of

Step 3, respectively setting and calculating a set U of points taking p as nearest neighbor in N for all points p in the core set_pN ^ p ═ N | u ∈ N ^ N (u) }; the specific process is as follows: for a certain point p, traversing all points U in N, judging whether the point p is the nearest neighbor of the point U according to the nearest neighbor relation recorded in the step 2-3, and if so, adding U into U, namely p equals NN (U)_pIn (1). Each point in the core set corresponds to a U_pSet, U_pRepresenting the set of all points with the point p as the nearest neighbor in N;

step 4, calculating an unfortunate query result set R, and with reference to fig. 3, specifically including the following processes:

4-1, initializing the regretted query result set to an empty set, i.e., commanding

4-2, calculating the coverage value cov (R) of the result set as a knotU of all points in fruit set_pThe size of the union of (a), i.e. cov (r) ═ | < u |_p∈RU_pI, |, cov (R) indicates how many points in R are nearest neighbors to N;

p′＝argmax_p∈Ccov(R∪p)-cov(R)，

adding the point p' into the result set, wherein if a plurality of points obtain the maximum coverage value increment at the same time, the points are selected according to the subscript sequence, but when the number of random sampling points is large, namely the set N is large, the situation hardly occurs;

4-5, recording the regrettability minimization query result set obtained in the step 4-4, and returning the regrettability minimization query result set to the user;

step 6, maintaining the core set and the regrettable minimization query result set, which specifically comprises:

6-1, judging whether the core set needs maintenance according to the condition of tuple insertion or tuple deletion, specifically analyzing according to the following two conditions, if the core set needs maintenance, executing the step 6-2, and if the core set does not need maintenance, turning to the step 5:

a) tuple p insertion: assigning the attribute value of each dimension of p as the original attribute value to be divided by the maximum attribute value of the corresponding dimension recorded in the step 1, namely normalizing p, then setting a set I and initializing the set I to be null, and adding u to the set I if the distance between u and the recorded nearest neighbor of the u is greater than the distance between u and p for all points u in N, namely | | | u-NN (u) | > | | u-p |, so that I represents a set of points affected by the nearest neighbor of the point due to the insertion of a tuple p, namely the point in I takes p as the new nearest neighbor of the point, if I is not null, the core set needs to be maintained, and if I is null, the maintenance is not needed;

b) tuple p deletion: if p is in the kernel set, then the kernel set needs maintenance, otherwise it means that maintenance is not needed.

6-2, adjusting the core set and the U according to the new nearest neighbor relation_pThen go to step 4; wherein, the adjusted content performs different operations according to two situations of tuple insertion and deletion:

a) tuple p insertion: insert point p into the kernel set and let U_pI, where I is the set I obtained in step 6-1, the new nearest neighbor of all points in I has become p, and for all points u, the following is performed: let q be NN (U) and let U be the next nearest neighbor of q_qIn which U is removed and added_pJudgment of U_qIf the core set is an empty set, removing q from the core set;

b) tuple p deletion: remove point p from the kernel set, U_pThe original nearest neighbors of all points in the list have been deleted, so that the following processing is respectively carried out for all points u in the list: searching the nearest neighbor of the new data set, recording the nearest neighbor as q, and taking U as U_pIn which U is removed and added_qJudging whether q is already in the core set, if not, adding q into the core set.

The technical solution of the present invention will be described in detail by a specific example.

The present embodiment assumes that the data dimension is equal to 2, i.e., d is 2, and the result set size required by the user is 2. First assume that the initial database contains the tuple p₁,p₂,p₃,p₄,p₅The specific attribute value information is as follows:

TABLE 1

Suppose that the change in the database is four times in total and in turn, insert p as follows₆(90,105) insertion of p₇(160,30), deleting p₄Deletion of p₂。

Step (1), searching the maximum value of the attribute value on each dimension in all the tuples of the initial database and the variation sequence to obtain m₁＝p₇[1]＝160，m₂＝p₂[2]150, assigning the attribute value in each dimension of all tuples to the original attribute value divided by the maximum attribute value in the corresponding dimension of the record, i.e. p₁[1]＝100/160＝0.625，p₂[1]45/150, 0.3, etc., which are not described in detail. The normalized attribute value information for each tuple is then as follows:

TABLE 2

Step (2-1), let

Step (2-2), acquiring d-2 random numbers, and assuming that u is obtained₁＝71，u₂When it is equal to 29, it gives

This is repeated several times (in this embodiment, it is assumed that the number is 5), and it is assumed that the related information of each point in the finally obtained set N is as follows:

TABLE 3

Step (2-3), | | u₁-p₁||＝1.814，||u₁-p₂||＝2.498，||u₁-p₃||＝2.085，||u₁-p₄||＝1.955，||u₁-p₅Comparing the values of 2.273, wherein p is₁Distance u₁Recently, therefore u₁Is NN (u)₁)＝p₁A 1 is to p₁Add the kernel set and similarly, the distance information between points is as follows:

TABLE 4

Yield NN (u)₂)＝p₂，NN(u₃)＝p₃，NN(u₄)＝p₁，NN(u₅)＝p₂The points and their neighbors are shown in fig. 4 (a). The nearest neighbors of each point at this time were recorded as follows:

TABLE 5

Thus the core set C ═ p₁,p₂,p₃}。

Step (3) because of p₁＝NN(u₁)，p₁＝NN(u₄) So that U is_p1＝{u₁,u₄Analogously, obtaining U_p1＝{u₂,u₅}，U_p3＝{u₃At this time, the kernel concentrates on each point U_pThe information is as follows:

TABLE 6

Step (4-1), let

And (4-2) calculating cov (r) 0.

Step (4-3), cov (R. U.p.)₁)-cov(R)＝|U_p1|-0＝2，cov(R∪p₂)-cov(R)＝|U_p2|-0＝2，cov(R∪p₃)-cov(R)＝|U _p31, when p₁And p₂All simultaneously achieve maximum coverageThe value is increased by an amount. Thus, p is chosen in order of subscript₁Add result set when R ═ p₁}。

Step (4-4), where the size of R is 1 and the user-specified result set size is 2, so steps (4-2) to (4-3) are repeated.

Step (4-2) of calculating cov (r) ═ U_p1|＝2。

Step (4-3), cov (R. U.p.)₁)-cov(R)＝|U_p1∪U_p1|-|U_p1|＝2-2＝0，cov(R∪p₂)-cov(R)＝|U_p1∪U_p2|-|U_p1|＝4-2＝2，cov(R∪p₃)-cov(R)＝|U_p1∪U_p3|-|U _p11, so that p is equal to₂Add result set when R ═ p₁,p₂}。

And (4-4), when the size of R is 2 and is equal to the size of the result set specified by the user, the step (4-5) is carried out.

And (4-5) recording and returning the regrettability minimization query result set R ═ { p ═ at the moment₁,p₂}。

Step (5), the changed insertion p occurs in the database₆(90,105), executing the step (6-1).

Step (6-1) of adding p₆Normalizing the attribute value to obtain p₆(90/160,105/150), i.e. p₆(0.563,0.7)。p₆Distances from each point in N are as follows:

TABLE 7

Because | | | u₁-NN(u₁)||＝||u₁-p₁||＝1.848<||u₁-p₆1.979, so u₁Without the addition of I, the reaction mixture was, similarly,

||u₂-NN(u₂)||＝||u₂-p₂||＝1.420<||u₂-p₆||＝1.804，u₂not adding into I;

||u₃-NN(u₃)||＝||u₃-p₃||＝1.565>||u₃-p₆||＝1.524，u₃adding the mixture into the solution I;

||u₄-NN(u₄)||＝||u₄-p₁||＝1.722>||u₄-p₆||＝1.685，u₄adding the mixture into the solution I;

||u₅-NN(u₅)||＝||u₅-p₂||＝1.464<||u₅-p₆||＝1.573，u₅no addition was made to I. Thus, I ═ u₃,u₄And f, if not, performing kernel set maintenance, and executing the step (6-2).

Step (6-2) of adding p₆Adding a kernel set and ordering U_p6＝I＝{u₃,u₄}; operate separately for each point in I, where for u₃，q＝NN(u₃)＝p₃Will u₃Slave U_p3Is removed, at this time U_p3For an empty set, p is₃Removing from the core set; for u₄，q＝NN(u₄)＝p₁Will u₄Slave U_p1Is removed, at this time U_p1＝{u₁And not an empty set. At this point, the core concentrates the U of each point_pThe information is as follows:

TABLE 8

The relationship between the points and their neighbors is shown in FIG. 4 (b). The nearest neighbors of each point after updating at this time are as follows:

TABLE 9

Then, the process proceeds to step (4 a).

The execution process of step (4a) is similar to that of step (4), and is not repeated, and the regressive-minimization query result set R ═ p at this time is returned₂,p₆}。

(thus far, the first change processing was completed)

Step (5a), inserting p with variation in database₇(160,30), executing the step (6-1 a).

Step (6-1a) of adding p₇Normalizing the attribute value to obtain p₇(160/160,30/150), i.e. p₇(1,0.2)。p₇Distances from each point in N are as follows:

watch 10

Because | | | u₁-NN(u₁)||＝||u₁-p₁||＝1.848>||u₁-p₇1.428, so u₁The addition of the compound to I, similarly,

||u₂-NN(u₂)||＝||u₂-p₂||＝1.420<||u₂-p₇||＝2.429，u₂not adding into I;

||u₃-NN(u₃)||＝||u₃-p₆||＝1.524<||u₃-p₇||＝1.665，u₃not adding into I;

||u₄-NN(u₄)||＝||u₄-p₆||＝1.685>||u₄-p₇||＝1.425，u₄adding the mixture into the solution I;

||u₅-NN(u₅)||＝||u₅-p₂||＝1.464<||u₅-p₇||＝2.036，u₅no addition was made to I. Thus, I ═ u₁,u₄And if not, performing kernel set maintenance, and executing the step (6-2 a).

Step (6-2a) of adding p₇Add kernel set and order Up₇＝I＝{u₁,u₄}; operate on each point in I separately for u₁，q＝NN(u₁)＝p₁Will u₁Slave U_p1Is removed, at this time U_p1For an empty set, p is₁Removing from the core set; for u₄，q＝NN(u₄)＝p₆Will u₄Slave U_p6Is removed, at this time U_p6＝{u₃And not an empty set. At this point, the core concentrates the U of each point_pThe information is as follows:

TABLE 11

The relationship between the points and their neighbors is shown in FIG. 4 (c). The nearest neighbor of the updated point at this time is as follows:

TABLE 12

And then, the step (4b) is carried out.

Step (4b) is executed (same as step (4) above), and the regressive rate minimization query result set R ═ p at this time is returned₂,p₇}。

(this second change processing is completed.)

Step (5b), deleting p with change in database₄And (6b) executing.

Step (6-1b) because p₄If not, step (5c) is performed without performing kernel set maintenance.

(thus far, the third changing treatment was completed)

Step (5c), deleting p with change in database₂And (6) executing the step (6-1 c).

Step (6-1c), p₂In the kernel set, kernel set maintenance is required, and step (6-2c) is performed.

Step (6-2c) of adding p₂Removing, U from the kernel set_p2＝{u₂,u₅And for each point, searching the nearest neighbor again, wherein the distance information of the relevant point is as follows:

watch 13

Thus for u₂Searching its nearest neighbor in the latest data set to obtain NN (u)₂)＝p₅Due to p₅Is not in core set, so p will be₅Adding into the core set, and adding u₂Adding U_p5Performing the following steps; for u₅Searching its nearest neighbor, NN (u), in the latest dataset₅)＝p₅At this time p₅Has been in the core set (for u)₂Already added at the time of operation), u is added₅Adding U_p5In (1). At this point, the core concentrates the U of each point_pThe information is as follows:

TABLE 14

The relationship between the points and their neighbors is shown in FIG. 4 (d). The nearest neighbors of each point after updating at this time are as follows:

watch 15

And then, the step (4d) is carried out.

Step (4d) is executed (same as step (4) above), and the regressive rate minimization query result set R ═ p at this time is returned₅,p₇}。

(this fourth alternation process is completed.)

And (5d), the database is not changed any more, and the method is ended.

In summary, the continuous regret rate minimization query method based on the core set of the present invention considers the influence of the insertion and deletion of tuples in the database on the regret rate minimization query result, constructs the core set based on the nearest neighbor search, and performs regret rate minimization query by using the core set and the maximum coverage method. On the basis of obtaining the initial core set and the initial regret minimization query result set, aiming at the condition that the insertion and deletion of the tuple in the database may change the core set and the regret minimization query result set, once the insertion or deletion of the tuple occurs in the database, whether the core set needs to be changed is judged, and the maintenance of the core set and the updating of the regret minimization query result set are triggered to meet the requirement of continuous regret minimization query. The invention utilizes the nearest neighbor relation in the core set to rapidly adjust the core set, thereby effectively reducing the maintenance time and improving the efficiency of continuous regret rate minimum query under the non-static database.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1. A continuous regret rate minimization query method based on a core set is characterized by comprising the following steps:

step 2, constructing a core set C of original data;

step 4, calculating an unfortunate query result set R;

2. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 1 is as follows: firstly, searching and recording the maximum attribute values in each dimension in all the tuples of the original data set D and the variation sequence, and respectively recording the maximum attribute values as m₁,m₂,…,m_dThen, assigning the attribute value of each dimension of all the tuples to the original attribute value divided by the maximum attribute value of the corresponding dimension of the record, i.e. the value

p[i]＝p[i]/m_iWherein p [ i ]]Representing the attribute value on the ith dimension of the tuple p; when the attribute values of all tuples are 0,1]An interval.

3. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 2 is as follows:

2-1, order

Then tuple (u)₁,u₂,…,u_d) I.e. representing a radius of

D-dimensional space ofRandomly sampling a point on the surface of the sphere within the non-negative image limit, adding the point into a set N, repeating the process for several times to obtain the sphere with the radius of

4. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific process of the step 3 is as follows: for a certain point p, traversing all points U in N, judging whether the point p is the nearest neighbor of the point U, if so, adding U into U, namely p equals NN (U)_pIn (1).

5. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific process of the step 4 is as follows:

4-1, order

p′＝argmax_p∈Ccov(R∪p)-cov(R)，

adding point p' to the result set;

6. The kernel set-based successive regressive rate minimization query method of claim 5, wherein: in step 4-3, if there are a plurality of points that all obtain the maximum coverage value increase at the same time, the points are selected according to the subscript sequence.

7. The kernel set-based successive regressive rate minimization query method of claim 1, wherein: the specific content of the step 6 is as follows:

8. The kernel set-based successive regressive rate minimization query method of claim 7, wherein: in step 6, for the case of tuple insertion, the specific process is as follows:

9. The kernel set-based successive regressive rate minimization query method of claim 7, wherein: in step 6, for the case of tuple deletion, the specific process is as follows: