CN106528790A - Method and device for selecting support point in metric space - Google Patents

Method and device for selecting support point in metric space Download PDF

Info

Publication number
CN106528790A
CN106528790A CN201610987363.8A CN201610987363A CN106528790A CN 106528790 A CN106528790 A CN 106528790A CN 201610987363 A CN201610987363 A CN 201610987363A CN 106528790 A CN106528790 A CN 106528790A
Authority
CN
China
Prior art keywords
data
strong point
distance
value
calculated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610987363.8A
Other languages
Chinese (zh)
Other versions
CN106528790B (en
Inventor
毛睿
李兴亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN201610987363.8A priority Critical patent/CN106528790B/en
Publication of CN106528790A publication Critical patent/CN106528790A/en
Application granted granted Critical
Publication of CN106528790B publication Critical patent/CN106528790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a method and a device for selecting a support point in a metric space. The method comprises the following steps: selecting a datum in a target data set according to a preset rule, and taking the selected datum as a support point; taking the other data except the selected datum serving as the support point in the target data set as data to be calculated; calculating distances between the data to be calculated and a support point which is selected lastly, and determining minimum distance values according to a comparison result between the calculated distances and pre-stored distance values; and selecting a datum corresponding to a minimum distance value which has a maximum numerical value in the minimum distance values, and taking the selected datum as the support point. Peripheral points of the metric space can be taken as preferable support points, so that the peripheral points of the metric space can be rapidly and accurately selected by calculation of a preset number of support points which are selected lately; the data management and analysis performance of the metric space is enhanced; and meanwhile, the indexing efficiency is increased.

Description

The choosing method and device of the strong point in metric space
Technical field
The invention belongs in field of computer technology, more particularly to a kind of metric space the strong point choosing method and dress Put.
Background technology
Metric space (Metric space) is a kind of the abstract of the very wide data type of coverage.Metric space is maximum Advantage be its height general applicability, user need to only provide distance function can be carried out data similarity search.So And, data are abstracted into the point in metric space, although improve versatility, but while also have lost coordinate information, Wei Yike Information is exactly distance value.Due to no coordinate, many mathematical tools directly can not be used.Conventional method is first to find out The strong point (Pivot), then using the distance of data to the strong point as coordinate.Therefore, the quality of the strong point selected for The performance of metric space data management analysis critical impact.
In prior art, field is indexed in metric space, it is considered that the good strong point is often the point at data turning.Most Remote first traversal (Farthest First Traversal, FFT) is that a kind of clustering method, i.e. fft algorithm find number for a kind of According to the method for middle peripheral point, with linear time complexity and space complexity, it is that the most popular strong point chooses calculation One of method.Although fft algorithm can select the point at data periphery and turning within the linear time, find in research optimum The strong point be frequently not turning point, but slightly offset from the point at turning, thus fft algorithm be often difficult to select optimum Support point.If it is desired to selecting the strong point of the optimum slightly offset from turning, then fft algorithm is accomplished by selecting more points, this will The number for supporting point selection is greatly increased, is caused the time complexity of fft algorithm to be degenerated to and is close to O (n2).Due to time complexity Spend and perform amount of calculation required for algorithm for representing, the size of time complexity can weigh the excellent of an algorithm with It is bad.
Usually, preferably the strong point is the point of the periphery positioned at metric space, using fft algorithm of the prior art without Method is accurate and quickly selects the point of periphery, so as to the performance of influence measures spatial data management analysis.
The content of the invention
The present invention provides a kind of choosing method and device of the strong point in metric space, it is intended to solve because of the preferably strong point It is the point of the periphery positioned at metric space, therefore fft algorithm of the prior art cannot accurately and quickly selects the point of periphery, from And the performance of influence measures spatial data management analysis.
The choosing method of the strong point in a kind of metric space that the present invention is provided, including:According to presetting rule in number of targets Data are chosen according to concentrating, and using the data selected as the strong point;The target data is concentrated except as the strong point Other data outside data are used as data to be calculated;Calculate the data to be calculated with last time select the strong point away from From, and lowest distance value is determined by the distance and the comparative result of the distance value for prestoring that calculate;Most narrow spacing described in choosing From the corresponding data of lowest distance value that numerical value in value is maximum, and using the data selected as the strong point.
The selecting device of the strong point in a kind of metric space that the present invention is provided, including:Module is chosen, for according to preset Rule is concentrated in target data and chooses data, and using the data selected as the strong point;The selection module, be additionally operable to by Other data of the target data concentration in addition to the data as the strong point are used as data to be calculated;Computing module, is used for Calculate the distance of the strong point that the data to be calculated are selected with last time, and by the distance that calculates with prestore away from Comparative result from value determines lowest distance value;The selection module, in being additionally operable to choose the lowest distance value, numerical value is maximum The corresponding data of lowest distance value, and using the data selected as the strong point.
The choosing method and device of the strong point in the metric space that the present invention is provided, according to presetting rule in target data set One data of middle selection, and the data selected are concentrated the target data except the data as the strong point as the strong point Outside other data as data to be calculated, calculate the distance of the strong point that the data to be calculated are selected with last time, And lowest distance value is determined by the comparative result of the distance for calculating and the distance value for prestoring, choose the lowest distance value The maximum corresponding data of lowest distance value of middle numerical value, and using the data selected as the strong point, due to the periphery of metric space Point can be used as the preferably strong point, therefore can be quick by only carrying out calculating with the strong point of the preset number selected recently And the peripheral point of metric space is accurately selected, the performance of metric space data management analysis is improved, while improve setting up rope The efficiency drawn.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing Accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention.
Fig. 1 be first embodiment of the invention provide metric space in the strong point choosing method realize flow process illustrate Figure;
Fig. 2 be second embodiment of the invention provide metric space in the strong point choosing method realize flow process illustrate Figure;
Fig. 3 is the structural representation of the selecting device of the strong point in the metric space that third embodiment of the invention is provided;
Fig. 4 is the structural representation of the selecting device of the strong point in the metric space that fourth embodiment of the invention is provided;
Fig. 5 is fft algorithm and RFT reconnaissance accuracy rate comparative result schematic diagram in UV data;
Fig. 6 is fft algorithm and RFT reconnaissance accuracy rate comparative result schematic diagram in Image data.
Specific embodiment
To enable goal of the invention of the invention, feature, advantage more obvious and understandable, below in conjunction with the present invention Accompanying drawing in embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described reality It is only a part of embodiment of the invention to apply example, and not all embodiments.Based on the embodiment in the present invention, people in the art The every other embodiment obtained under the premise of creative work is not made by member, belongs to the scope of protection of the invention.
It should be noted that data are filled with the property of sufficient metric space mostly in the art, in the embodiment of the present invention In target data set also meet the property of metric space.
Fig. 1 is referred to, the realization of the choosing method of the strong point in the metric space that Fig. 1 is provided for first embodiment of the invention Schematic flow sheet, can be applicable in the terminal of all operation programs, the choosing method of the strong point in the metric space shown in Fig. 1, Mainly include the following steps that:
S101, concentrated in target data according to presetting rule and choose data, and using the data selected as the strong point.
Metric space refers to a set in mathematics, and the distance between the arbitrary element in the set is definable 's.The presetting rule is the predefined rule for choosing data, and the presetting rule can be the data arbitrarily chosen, it is also possible to Be the target data concentrate one chosen compared to other data apart from farthest data.Then the data this selected as The strong point.
S102, the target data is concentrated other data in addition to the data as the strong point as data to be calculated.
In actual applications, concentrate in target data, data to be calculated are its in addition to the data as the strong point His data.
S103, calculate the distance of the strong point that the data to be calculated are selected with last time, and by the distance that calculates with The comparative result of the distance value for prestoring determines lowest distance value.
The data to be calculated are multiple, and now each data to be calculated is required to calculate the support selected with last time The distance between point, the number of data to be calculated is identical with the number of the distance for calculating.Wherein number of the data to be calculated etc. The number of data not as the strong point is concentrated in the target data.
S104, the maximum corresponding data of lowest distance value of numerical value in the lowest distance value are chosen, and by the data selected Corresponding to the metric space point as the strong point.
After step s 103, the lowest distance value of numerical value maximum is chosen from the lowest distance value for obtaining.
After step s 104, step S102- step S104 is returned to, continues to concentrate not as support in the target data The data of point calculate the strong point.
In the embodiment of the present invention, concentrated in target data according to presetting rule and choose data, and by the data selected As the strong point, the target data is concentrated other data in addition to the data as the strong point as data to be calculated, meter The distance of the strong point that the data to be calculated are selected with last time is calculated, and by the distance for calculating and the distance value for prestoring Comparative result determine lowest distance value, choose the maximum corresponding data of lowest distance value of numerical value in the lowest distance value, and Using the data selected as the strong point, due to metric space peripheral point can as the preferably strong point, therefore by only with most The strong point of the preset number closely selected carries out calculating the peripheral point that can quickly and accurately select metric space, raising degree The performance of quantity space data management analysis, while improve the efficiency for setting up index.
Fig. 2 is referred to, the realization of the choosing method of the strong point in the metric space that Fig. 2 is provided for second embodiment of the invention Schematic flow sheet, can be applicable to, in the terminal of all operation programs, mainly include the following steps that:
S201, concentrate each data that queue is set for target data.
The quantity that distance is wherein stored up in the queue is the number of the strong point selected recently, and what this was selected recently props up The number of support point is 1 or 2.That is, queue stores up to 1 or 2 distance value.It should be noted that in an initial condition, data pair The queue answered is sky, i.e. queue does not store any distance value.
Queue is a kind of special linear list, is characterized in that queue only allows to be deleted in the front end (front) of table Division operation, and insertion operation is carried out in the rear end (rear) of table, in queue, the end of insertion operation is referred to as tail of the queue, deletion action End be referred to as team's head.In queue during no element, referred to as empty queue;The data element of queue is also called queue element (QE).In queue In, one queue element (QE) of insertion is referred to as joining the team, and a queue element (QE) is deleted from queue to be become and team.Because queue is only allowed One end is inserted, and deletes in the other end, so the element for only entering enqueue earliest could be deleted from queue at first, therefore queue is again Referred to as first in first out (FIFO first in first out) linear list.
In embodiments of the present invention, queue also has the characteristic of above-mentioned first in first out, therefore when the quantity of queue storage surpasses When going out maximum storage, the distance value joined the team at first is deleted, then again new distance value is joined the team.
S202, concentrated in target data according to presetting rule and choose data, and using the data selected as the strong point.
Alternatively, concentrated in target data according to presetting rule and choose data, and using the data selected as support Point is specially:
The data on selection preset position are concentrated as reference data in the target data;
Using distance reference data in the target data set farthest data as the strong point.
Data on the preset position can be concentrated primary data positioned at target data, or be located at target The data of last position in data set, also can be the data that any position is concentrated from target data.Then select apart from the reference Used as the strong point, wherein this is a number that the target data is concentrated as the farthest data of the strong point to the farthest data of data According to.
Alternatively, concentrated in target data according to presetting rule and choose data, and using the data selected as support Point, can also be specifically:
Concentrate from the target data and select nth data;
Calculating the nth data respectively concentrates the data in addition to the nth data to be located at the tolerance with the target data N variance yields of the distance at space midpoint;
Choose the maximum variance value in the n variance yields;
The corresponding data of maximum variance value are extracted, and using the data extracted as the strong point.
Concentrate from the target data and arbitrarily select nth data.The data of the extraction are the corresponding number of maximum variance value According to.
S203, the target data is concentrated other data in addition to the data as the strong point as data to be calculated.
In actual applications, concentrate in target data, data to be calculated are its in addition to the data as the strong point His data.
S204, calculate the distance of the strong point that the data to be calculated are selected with last time, and by the distance that calculates with The comparative result of the distance value for prestoring determines lowest distance value.
Alternatively, the distance of the strong point that the data to be calculated are selected with last time, and the distance by calculating are calculated Determine that lowest distance value is specially with the comparative result of the distance value for prestoring:
The data to be calculated that setting is selected are than the m-th data;
Calculate the distance of the strong point that the than the m-th data is selected with last time;
The distance stored in the queue distance for calculating corresponding with the than the m-th data is compared, numerical value is selected minimum Distance as the corresponding lowest distance value of the than the m-th data.
If distance is not stored with the corresponding queue of than the m-th data, the distance for calculating is just as lowest distance value.
In actual applications, in metric space, data can have various to the computational methods of strong point distance, i.e., different Data type distance is calculated by different distance functions, the distance function is on the basis of the condition for meeting metric space On.Metric space may be defined as two tuples, and (S, d), wherein S is the data acquisition system of limited non-NULL, and d is defined on S Distance function with following three property:
Orthotropicity:For any x, y ∈ S, d (x, y)>=0, and
Symmetry:For any x, y ∈ S, d (x, y)=d (y, x);
Triangle inequality:For any x, y, z ∈ S, d (x, y)+d (y, z)>=d (x, z).
Three kinds of data types and the corresponding distance function of three kinds of data types is given below:
The first data type is:Vectorial (Vector) data type, the distance function of the Vector data be Europe it is several in Moral distance
Second data type be:Picture (Image) data type, the distance function of the Image data is L races distance (linear combination), Image data be via feature extraction after data, be made up of three feature sets, represent the structure of picture, stricture of vagina Reason and color.Each feature set can regard vectorial a, illustration as, and the size of structure characteristic collection is 3, texture feature set Size is 48, and color characteristic collection size is 15, so pictures can be expressed as the vector of one 66 dimension.Each feature set There are a distance function, texture and structure to use L2 distances, color to use L1 distances.Distance between Image data is that each is special One linear combination of collection distance.Wherein, L races range formula is as follows:
The third data type is:Gene data type, the distance function that the gene data is used are overall comparison (Global alignment)。
S205, the maximum corresponding data of lowest distance value of numerical value in the lowest distance value are chosen, and by the data selected As the strong point.
After step s 104, step S102- step S104 is returned to, continues to concentrate not as support in the target data The data of point calculate the strong point.
After step S205, step S203- step S205 is returned to, continue to concentrate not as support in the target data The data of point calculate the strong point.
Illustrate, in choosing the lowest distance value, the maximum corresponding data of lowest distance value of numerical value are specially:
Under original state, i.e., before step S201, a referential data, the wherein referential data are pre-set<0.When When obtaining lowest distance value a of data in step S204 for the first time, lowest distance value a is compared with the referential data Compared with, as distance value is greater than 0 numerical value, then lowest distance value a>Then the referential data is updated to by referential data The numerical value of lowest distance value, so in step S204 on when once obtaining lowest distance value b of another data again, it is minimum Distance value b is compared with the referential data after renewal, by that analogy, is equivalent to select from all lowest distance values for obtaining Go out the maximum lowest distance value of a numerical value, specific embodiment passes behind application scenarios to illustrate, here is omitted.
Foregoing description is illustrated with the application scenarios of a reality below, referring specifically to as follows:
There are tetra- data of data0, data1, data2, data3 in assuming target data set data, four data difference Corresponding queue queue is queue0, queue1, queue2, queue3, and the quantity that queue is stored up to is 1 or 2;Order is last The strong point once selected is lastpoint;Referential data is maxdis=-1.
Step 1, data data0 are chosen in data according to presetting rule, and using data0 as the strong point, and plus Enter in result set;
Step 2, order last time add the strong point of result set as lastpoint;
Step 3, using the data of nonsupport point as data to be calculated;
Step 4, when data to be calculated are data1, calculate data1 to lastpoint apart from dis1, now last The secondary lastpoint=data0 for adding result set;
It is empty in the corresponding queue1 of step 5, now data1, i.e. queue1 is not stored with distance value, then dis1 is used as most Small distance value mindis;
Step 6, mindis and maxdis are compared, wherein mindis=dis1>0, maxdis=-1<Mindis, Then maxdis, i.e. maxdis=mindis=dis1 are given by the value of mindis, and make nextpoint=data1;
Step 7, judge queue1 in the distance value that stores whether beyond the quantity for storing up to, if not less than will Dis1 joins the team in queue1, if exceeding, queue1 heads of the queue go out team, and dis1 is joined the team in queue1;
It should be noted that in steps of 5, if queue1 is not sky, mindis is minima in queue1, dis1 and In queue1, minima is compared, if dis1<Mindis, then mindis=dis1, if dis1>Mindis, then mindis Value is constant, and now step 6 also needs execution mindis to compare with maxdis, but the value of mindis here is not equal to dis1, and It is that mindis is the minima in queue1, the operation of queue1 then according to the description execution dis1 of above-mentioned steps 7 joins the team.
Above-mentioned steps 4- step 7 is partial circulating, after exactly having calculated data1, is described according to above-mentioned steps 3- step 7 Computational methods travel through whole target data set calculating data2.
Then above-mentioned steps 4 are circulated to the process of step 7 calculating data2, it is specific as follows:
Step 4, when data to be calculated are data2, calculate data2 to lastpoint apart from dis2, due to except Data0, does not have other strong points, therefore lastpoint=data0;
It is empty in the corresponding queue2 of step 5, now data2, i.e. queue2 is not stored with distance value, then dis2 is used as most Small distance value mindis, i.e. mindis=dis2;
Step 6, mindis and maxdis are compared, due to when data1 is calculated, maxdis=dis1, so ratio Compared be dis1 and dis2 numerical value, it is assumed that dis1<Dis2, then maxdis<Mindis, then give the value of mindis Maxdis, i.e. maxdis=mindis=dis2, now nextpoint=data2 ≠ data1;
Step 7, judge queue2 in the distance value that stores whether beyond the quantity for storing up to, if not less than will Dis2 joins the team in queue2, if exceeding, queue2 heads of the queue go out team, and dis2 is joined the team in queue2;
It should be noted that in step 6, such as assuming dis1>Dis2, then maxdis>Mindis, then the value of maxdis is not Become, maxdis=dis1, then now nextpoint=data1, then also need to perform the process of above-mentioned steps 7, be i.e. dis2 enters The process of team queue2.
Then above-mentioned steps 4 are circulated to the process of step 7 calculating data3, it is specific as follows:
Step 4, when data to be calculated are data3, calculate data3 to lastpoint apart from dis3, due to except Data0, does not have other strong points, therefore lastpoint=data0;
It is empty in the corresponding queue3 of step 5, now data3, i.e. queue3 is not stored with distance value, then dis3 is used as most Small distance value mindis, i.e. mindis=dis3;
Step 6, mindis and maxdis are compared, due to when data2 is calculated, maxdis=dis2 or maxdis =dis1, thus compare be dis2 and dis3 numerical value, or compare be dis1 and dis3 numerical value, then therefrom choose number Value it is maximum as maxdis, then the maximum distance of numerical value is used as nextpoint, that is to say, that nextpoint may be Any one in data1, data2, data3 tri-;
Step 7, judge queue3 in the distance value that stores whether beyond the quantity for storing up to, if not less than will Dis3 joins the team in queue3, if exceeding, queue3 heads of the queue go out team, and dis3 is joined the team in queue3;
When in target data set data it is all not as the strong point data all travel through after, perform below step:
Step 8, nextpoint is added in result set, wherein the data in result set are the strong point, if Nextpoint=data3, then using data3 as the strong point, be added in result set, now result set include data0 and data3。
Then the computational methods of step 3 to step 8 description are systemic circulation, after a strong point is selected, are turned again to Step 3, using without as the data of the strong point as data to be calculated, the partial circulating of execution step 4- step 7 is traveled through in data All data not as the strong point, last execution step 8 select the strong point again.It should be noted that second travels through In data it is all not as the data of the strong point when, lastpoint may change, for example, after the completion of above-mentioned final step 8, It is data3 that last time is added to the data in result set, then all numbers not as the strong point in second traversal data According to calculating lastpoint=data3 in the method for the strong point, by that analogy, traversal third time, the 4th time ..., do not do herein Repeat.
In actual applications, the strong point chosen can be concentrated with the number of preset maximum support point, i.e., one target data Number be equal to preset maximum support point number, the number of the wherein preset maximum support point is the integer more than 0.
It should be noted that when the number of preset maximum support point is 1, the strong point in result set is above-mentioned steps 1 The strong point data0 for selecting, does not carry out above-mentioned step 2- step 8.
The whole selection strong point algorithm operation of foregoing description program in the terminal is provided below, it is as follows:Input:Data Collection data,
Distance function metric,
Most number pointNum of the multiselect strong point
Output:Result set resultSet
Continue executing with step 4 to 17;
Wherein, maxdis=-1, is the preset referential data described in above-mentioned steps S205.
It should be noted that partial circulating is from step 5 to step 12, systemic circulation is the wherein step from step 4 to step 17 The number of the strong point selected recently described in the step of recentsize is the present embodiment in 1 S201, i.e., Recentsize=1 or 2.
If it should be noted that recentsize>2, then in the strong point selected, the peripheral point of existing metric space has again The internal point of metric space, but internal point is not the preferably strong point, therefore recentsize only have to be equal to 1 or 2.
In the embodiment of the present invention, it is that target data concentrates each data to arrange queue, according to presetting rule in target data Concentrate and choose data, and the data selected are concentrated the target data except the data as the strong point as the strong point Outside other data as data to be calculated, calculate the distance of the strong point that the data to be calculated are selected with last time, and Lowest distance value is determined by the distance and the comparative result of the distance value for prestoring that calculate, number in the lowest distance value is chosen The maximum corresponding data of lowest distance value of value, and using the data selected as the strong point, as the peripheral point of metric space can Using as the preferably strong point, therefore can be quickly and accurate by calculating is only carried out with the strong point of the preset number selected recently The true peripheral point for selecting metric space, improves the performance of metric space data management analysis, while improve setting up index Efficiency.
Fig. 3 is referred to, Fig. 3 is the structure of the selecting device of the strong point in the metric space that third embodiment of the invention is provided Schematic diagram, for convenience of description, illustrate only the part related to the embodiment of the present invention.Support in the metric space of Fig. 3 examples The selecting device of point can be holding for the choosing method of the strong point in the metric space that aforementioned Fig. 1 and embodiment illustrated in fig. 2 are provided Row main body.In the metric space of Fig. 3 examples, the selecting device of the strong point, mainly includes:Choose module 301 and computing module 302. Each functional module describes in detail as follows above:
Module 301 is chosen, and data is chosen for concentrating in target data according to presetting rule, and by the data selected As the strong point;
Module 301 is chosen, is additionally operable to the target data is concentrated other data in addition to the data as the strong point make For data to be calculated;
Computing module 302, for calculating the distance of the strong point that the data to be calculated are selected with last time, and by calculating The distance for going out determines lowest distance value with the comparative result of the distance value for prestoring;
Module 301 is chosen, is additionally operable to choose the maximum corresponding data of lowest distance value of numerical value in the lowest distance value, and Using the data selected as the strong point.
The presetting rule is the predefined rule for choosing data, and the presetting rule can be the data arbitrarily chosen, Can also be the target data concentrate one chosen compared to other data apart from farthest data.Then the number this selected According to as the strong point.The data to be calculated are multiple, and now each data to be calculated is required to calculate and is selected with last time The distance between the strong point, the number of data to be calculated is identical with the number of the distance for calculating.The wherein data to be calculated Number is equal to the number that the target data concentrates data not as the strong point.
The present embodiment details not to the greatest extent, refers to the description of aforementioned embodiment illustrated in fig. 1, and here is omitted.
It should be noted that in the metric space of figure 3 above example in the embodiment of the selecting device of the strong point, each work( The division of energy module is merely illustrative of, can as needed in practical application, such as the configuration requirement or soft of corresponding hardware The convenient consideration of the realization of part, and above-mentioned functions distribution is completed by different functional modules, will image processing apparatus it is interior Portion's structure is divided into different functional modules, to complete all or part of function described above.And, in practical application, Corresponding functional module in the present embodiment can be realized by corresponding hardware, it is also possible to be performed by corresponding hardware corresponding Software is completed.Each embodiment that this specification is provided can all apply foregoing description principle, below repeat no more.
In the embodiment of the present invention, choose module 301 and one data of selection are concentrated in target data according to presetting rule, and will The data selected concentrate as the strong point, then using the target data other data in addition to the data as the strong point as Data to be calculated, computing module 302 calculate the distance of the strong point that the data to be calculated are selected with last time, and by calculating The comparative result of distance and the distance value for prestoring determine lowest distance value, choose module 301 and choose the lowest distance value The maximum corresponding data of lowest distance value of middle numerical value, and using the data selected as the strong point, due to the periphery of metric space Point can be used as the preferably strong point, therefore can be quick by only carrying out calculating with the strong point of the preset number selected recently And the peripheral point of metric space is accurately selected, the performance of metric space data management analysis is improved, while improve setting up rope The efficiency drawn.
Fig. 4 is referred to, the structure of the selecting device of the strong point in the metric space that Fig. 4 is provided for fourth embodiment of the invention Schematic diagram, for convenience of description, illustrate only the part related to the embodiment of the present invention.Support in the metric space of Fig. 4 examples The selecting device of point can be holding for the choosing method of the strong point in the metric space that aforementioned Fig. 1 and embodiment illustrated in fig. 2 are provided Row main body.In the metric space of Fig. 4 examples, the selecting device of the strong point, mainly includes:Setup module 401, selection 402 and of module Computing module 403.Each functional module describes in detail as follows above:
Setup module 401, for concentrating each data to arrange queue for target data.
The quantity that distance is wherein stored up in the queue is the number of the strong point selected recently, what this was selected recently The number of the strong point is 1 or 2 for the quantity that distance is stored up in 1 or 2, the i.e. queue.
Queue is a kind of special linear list, is characterized in that queue only allows to be deleted in the front end (front) of table Division operation, and insertion operation is carried out in the rear end (rear) of table, in queue, the end of insertion operation is referred to as tail of the queue, deletion action End be referred to as team's head.In queue during no element, referred to as empty queue;The data element of queue is also called queue element (QE).In queue In, one queue element (QE) of insertion is referred to as joining the team, and a queue element (QE) is deleted from queue to be become and team.Because queue is only allowed One end is inserted, and deletes in the other end, so the element for only entering enqueue earliest could be deleted from queue at first, therefore queue is again Referred to as first in first out (FIFO first in first out) linear list.
In embodiments of the present invention, queue also has the characteristic of above-mentioned first in first out, therefore when the quantity of queue storage surpasses When going out maximum storage, the distance value joined the team at first is deleted, then again new distance value is joined the team.
Module 402 is chosen, and data is chosen for concentrating in target data according to presetting rule, and by the data selected As the strong point.
Alternatively, module 402 is chosen, be additionally operable to the data on selection preset position be concentrated as reference in the target data Data;
Module 402 is chosen, is additionally operable to distance in the target data set reference data farthest data as the support Point.
Data on the preset position can be concentrated primary data positioned at target data, or be located at target The data of last position in data set, also can be the data that any position is concentrated from target data.Then select apart from the reference Used as the strong point, wherein this is a number that the target data is concentrated as the farthest data of the strong point to the farthest data of data According to.
Alternatively, choosing module 402 includes:Select submodule 4021, calculating sub module 4022 and extracting sub-module 4023.
Submodule 4021 is selected, and nth data is selected for concentrating from the target data;
Calculating sub module 4022, for calculate respectively the nth data and the target data concentrate except the nth data it N variance yields of the distance of outer data;
Submodule 4021 is selected, for choosing the maximum variance value in the n variance yields;
Extracting sub-module 4023, for extracting the corresponding data of maximum variance value, and using the data extracted as this Support point.
Select submodule 4021 to concentrate from the target data and arbitrarily select nth data.What extracting sub-module 4033 was extracted Data are the corresponding data of maximum variance value.
Module 402 is chosen, is additionally operable to the target data is concentrated other data in addition to the data as the strong point make For data to be calculated.
In actual applications, concentrate in target data, data to be calculated are its in addition to the data as the strong point His data.
Computing module 403, for calculating the distance of the strong point that the data to be calculated are selected with last time, and by calculating The distance for going out determines lowest distance value with the comparative result of the distance value for prestoring.
Alternatively, computing module 403 includes:Submodule 4031, computing submodule 4032 are set and submodule 4033 is chosen;
Submodule 4031 is set, is than the m-th data for arranging the data to be calculated selected;
Computing submodule 4032, for calculating the distance of the strong point that the than the m-th data is selected with last time;
Submodule 4033 is chosen, for the distance stored in the queue distance for calculating corresponding with the than the m-th data is entered Row compares, and selects the minimum distance of numerical value as the corresponding lowest distance value of the than the m-th data.
If distance is not stored with the corresponding queue of than the m-th data, the distance that selection submodule 4033 is calculated just is made For lowest distance value.
The present embodiment details not to the greatest extent, refers to the description of aforementioned Fig. 1 and embodiment illustrated in fig. 2, and here is omitted.
In the embodiment of the present invention, setup module 401 is that target data concentrates each data to arrange queue, chooses module 402 Concentrated in target data according to presetting rule and choose data, and using the data selected as the strong point, by the target data Concentrate other data in addition to the data as the strong point as data to be calculated, computing module 403 calculates the number to be calculated It is according to the distance of the strong point selected with last time and true with the comparative result of the distance value for prestoring by the distance for calculating Determine lowest distance value, as the peripheral point of metric space can be as the preferably strong point, therefore by only pre- with what is selected recently The strong point for putting number carries out calculating the peripheral point that can quickly and accurately select metric space, improves metric space data The performance of management analyses, while improve the efficiency for setting up index.
The experimental result of emulation is provided below, proves the method described by the embodiment of the present invention than existing fft algorithm It is more superior, altogether in terms of two proving, one be the optimum strong point of choosing speed, another is to choose the more excellent strong point The accuracy rate of (peripheral point of metric space).
In a first aspect, choose the speed ratio of the optimum strong point compared with:
As the method for the selection strong point and the time complexity of fft algorithm described by the embodiment of the present invention is identical , the execution time of algorithm is also affected by factors such as code realization, machine conditions, therefore not with run time comparing both Speed, but the selection number of times carried out during the optimum strong point is selected as standard weigh reconnaissance speed.In other conditions all In the case of identical, if reconnaissance number of times is more, then it represents that speed is slower.
Choose.As shown in table 2, data are UV (Uniform Vector 2 dimension data on), chooses the result on inquiry radius 0.03 to 0.07, and corresponding value is to select institute during the optimum strong point The selection number of times of the strong point for carrying out, or perhaps select total points selected during the optimum strong point.Value is less, represents speed It is faster.Write for convenience, the method for the selection strong point described by the embodiment of the present invention is write a Chinese character in simplified form into:It is farthest in the recent period to travel through (RFT, Recent Farthest Traversal) algorithm, as recentSize=1, RFT is RFT1;Work as recentSize When=2, RFT is RFT2.
The optimum strong point is selected from 1000 data, 1000 selections are at most needed.From table 2 it can be seen that fft algorithm The average strong point for needing 628 times can just select optimum, is at this moment close to O (n2) time complexity.And RFT2 is averagely only needed Same work can just be completed 28.8 times.RFT1 is more slightly slower than RFT2, average to need 65.4 times, but also more than fft algorithm It is faster.Radius 0.03 is identical with result on radius 0.05 to be because on two radiuses that the optimum strong point is identical.Image types Similar with UV type conclusions, here is omitted.
Second aspect, chooses the accuracy rate of the more excellent strong point (peripheral point of metric space):
The front m strong point combination of best performance is chosen first, and the m strong point constitutes a set P.From FFT or RFT Front n is taken in the strong point selected, the n strong point constitutes a set Q.So, accuracy rate h may be defined as being expressed as below Formula:
Because the size of set P is decided by m, (strong point that FFT or RFT are selected will not be again equal to n for the size of set Q It is multiple), when m and n changes, accuracy rate fixes m and n so that two kinds of algorithms have also with change for different emphasis Comparability.In the case of fixed m, gradually increase n, then accuracy rate h will level off to 1.When data dimension increases, in data volume In the case of constant, because data distribution becomes more sparse, we suitably increase the value of n so that result is more Plus it is directly perceived.Accuracy rate h is higher, represents that the excellent strong point that algorithm is selected is more, so as to the probability for obtaining high search performance is got over Greatly.
We are tested in bis- kinds of data types of Vector, Image.Fig. 5 is fft algorithm and RFT in UV data Reconnaissance accuracy rate comparative result schematic diagram.In Fig. 5, transverse axis represents inquiry radius, used 0.01~0.10 at intervals of 0.01 ten Individual radius;The longitudinal axis represents accuracy rate.M is taken as 100, n and is taken as 40.
From figure 5 it can be seen that the accuracy rate of fft algorithm is steadily 10% or so;The accuracy rate of RFT2 is not less than 70%, And 92% can be reached;The accuracy rate of RFT1 fluctuates between 30%~40%.It should be noted that in Fig. 5 curve fluctuation It is because on different radii, optimum front m strong point combination is different, i.e. the size of set P is different.See During Fig. 5, fft algorithm and RFT to be compared on identical radius, so just there is comparability.We provide the ratio of multiple radiuses Compared with it is common situation to express RFT more superior than fft algorithm to be, is not an example.
Fig. 6 is fft algorithm and RFT reconnaissance accuracy rate comparative result schematic diagram in Image data, wherein inquiry radius is 0.01~0.06, m are taken as 100, n and are taken as 60, although the accuracy rate of RFT unlike so high in Vector data types, but relatively For FFT, four times have still been higher by five times.
Above experimental result illustrates that no matter the method for the selection strong point described in the embodiment of the present invention is from speed Or accuracy rate is better than fft algorithm in prior art.
In multiple embodiments provided herein, it should be understood that disclosed system, apparatus and method, can be with Realize by another way.For example, device embodiment described above is only schematic, for example, the module Divide, only a kind of division of logic function can have other dividing mode, such as multiple module or components when actually realizing Can with reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or The coupling each other for discussing or direct-coupling or communication linkage can be the indirect couplings by some interfaces, device or module Close or communication linkage, can be electrical, mechanical or other forms.
The module as separating component explanation can be or may not be it is physically separate, it is aobvious as module The part for showing can be or may not be physical module, you can local to be located at one, or can also be distributed to multiple On mixed-media network modules mixed-media.Some or all of module therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional module in each embodiment of the invention can be integrated in a processing module, it is also possible to It is that modules are individually physically present, it is also possible to which two or more modules are integrated in a module.Above-mentioned integrated mould Block both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.
If the integrated module is realized and as independent production marketing or use using in the form of software function module When, can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part for contributing to prior art in other words or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, use so that a computer including some instructions Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the invention Portion or part steps.And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
It should be noted that for aforesaid each method embodiment, for easy description, therefore which is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention do not limited by described sequence of movement because According to the present invention, some steps can adopt other orders or while carry out.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, and involved action and module might not all be these It is bright necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiments.
It is more than the description to the choosing method and device of the strong point in metric space provided by the present invention, for ability The technical staff in domain, according to the thought of the embodiment of the present invention, will change in specific embodiments and applications, To sum up, this specification content should not be construed as limiting the invention.

Claims (10)

1. in a kind of metric space the strong point choosing method, it is characterised in that include:
Concentrated in target data according to presetting rule and choose data, and using the data selected as the strong point;
The target data is concentrated other data in addition to the data as the strong point as data to be calculated;
Calculate the distance of the strong point that the data to be calculated are selected with last time, and by the distance that calculates with prestore The comparative result of distance value determine lowest distance value;
The maximum corresponding data of lowest distance value of numerical value in the lowest distance value are chosen, and using the data selected as support Point.
2. method according to claim 1, it is characterised in that described concentration in target data according to presetting rule chooses Individual data, and the data selected are specifically included as the strong point:
The data on selection preset position are concentrated as reference data in the target data;
Using reference data described in distance in the target data set farthest data as the strong point.
3. method according to claim 1, it is characterised in that described concentration in target data according to presetting rule chooses Individual data, and the data selected are specifically included as the strong point:
Concentrate from the target data and select nth data;
The n that the nth data concentrates the distance of the data in addition to the nth data with the target data is calculated respectively Individual variance yields;
Choose the maximum variance value in the n variance yields;
The corresponding data of the maximum variance value are extracted, and using the data of the extraction as the strong point.
4. according to the method in claim 2 or 3, it is characterised in that described concentration in target data according to presetting rule is selected Data are taken, and the data selected also are included as before the strong point:
Concentrate each data that queue is set for the target data, wherein the quantity that distance is stored up in the queue is described The number of the strong point selected recently, the number of the strong point selected recently is 1 or 2.
5. method according to claim 4, it is characterised in that the calculating data to be calculated are selected with last time The strong point distance, and lowest distance value is determined by the comparative result of the distance for calculating and the distance value for prestoring, is had Body includes:
The data described to be calculated that setting is selected are than the m-th data;
Calculate the distance of the strong point that the than the m-th data is selected with last time;
The distance stored in the queue distance for calculating corresponding with the than the m-th data is compared, numerical value minimum is selected Distance is used as the corresponding lowest distance value of the than the m-th data.
6. in a kind of metric space data supporting point selecting device, it is characterised in that described device includes:
Choose module, for according to presetting rule target data concentrate choose data, and using the data selected as Support point;
The selection module, be additionally operable to concentrate the target data other data in addition to the data as the strong point as Data to be calculated;
Computing module, for calculating the distance of the strong point that the data to be calculated are selected with last time, and by calculating Distance determines lowest distance value with the comparative result of the distance value for prestoring;
The selection module, is additionally operable to choose the maximum corresponding data of lowest distance value of numerical value in the lowest distance value, and Using the data selected as the strong point.
7. device according to claim 6, it is characterised in that
The selection module, is additionally operable to concentrate the data on selection preset position as reference data in the target data;
The selection module, is additionally operable to reference data described in distance in the target data set farthest data as described Support point.
8. device according to claim 6, it is characterised in that the selection module includes:
Submodule is selected, and nth data is selected for concentrating from the target data;
Calculating sub module, is concentrated in addition to the nth data with the target data for calculating the nth data respectively Data distance n variance yields;
The selection submodule, is additionally operable to choose the maximum variance value in the n variance yields;
Extracting sub-module, for extracting the corresponding data of the maximum variance value, and using the data extracted as the strong point.
9. the device according to claim 7 or 8, it is characterised in that described device also includes:
Setup module, for concentrating each data to arrange queue for the target data, wherein store up in the queue away from From quantity be the strong point selected recently number, the number of the strong point selected recently is 1 or 2.
10. device according to claim 9, it is characterised in that the computing module includes:
Submodule is set, is than the m-th data for arranging the data described to be calculated selected;
Computing submodule, for calculating the distance of the strong point that the than the m-th data is selected with last time;
Submodule is chosen, for the distance stored in the queue distance for calculating corresponding with the than the m-th data is compared, The minimum distance of numerical value is selected as the corresponding lowest distance value of the than the m-th data.
CN201610987363.8A 2016-11-08 2016-11-08 The choosing method and device of supporting point in metric space Active CN106528790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610987363.8A CN106528790B (en) 2016-11-08 2016-11-08 The choosing method and device of supporting point in metric space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610987363.8A CN106528790B (en) 2016-11-08 2016-11-08 The choosing method and device of supporting point in metric space

Publications (2)

Publication Number Publication Date
CN106528790A true CN106528790A (en) 2017-03-22
CN106528790B CN106528790B (en) 2019-08-16

Family

ID=58351082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610987363.8A Active CN106528790B (en) 2016-11-08 2016-11-08 The choosing method and device of supporting point in metric space

Country Status (1)

Country Link
CN (1) CN106528790B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798338A (en) * 2017-09-28 2018-03-13 佛山科学技术学院 A kind of intensive strong point fast selecting method of big data
CN113282337A (en) * 2021-06-02 2021-08-20 深圳大学 Method and device for searching optimal complete division index of metric space and related components
CN113407799A (en) * 2021-06-22 2021-09-17 深圳大学 Performance measurement method and device for measuring space division boundary and related equipment
WO2022217748A1 (en) * 2021-04-14 2022-10-20 深圳计算科学研究院 Method and apparatus for measuring performance of support point of metric space, and related assembly
WO2022267094A1 (en) * 2021-06-22 2022-12-29 深圳计算科学研究院 Euclidean distance-based metric space index construction method and apparatus, and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281652A (en) * 2014-09-16 2015-01-14 深圳大学 One-by-one support point data dividing method in metric space
CN105260742A (en) * 2015-09-29 2016-01-20 深圳大学 Unified classification method for multiple types of data and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281652A (en) * 2014-09-16 2015-01-14 深圳大学 One-by-one support point data dividing method in metric space
CN105260742A (en) * 2015-09-29 2016-01-20 深圳大学 Unified classification method for multiple types of data and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GIUSEPPE AMATO 等: ""A comparisonofpivotselectiontechniques for permutation-basedindexing"", 《INFORMATION SYSTEMS》 *
RUI MAO 等: ""Pivot selection for metric-space indexing"", 《INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS》 *
李兴亮 等: ""基于近期最远遍历的支撑点选择"", 《南京大学学报(自然科学)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798338A (en) * 2017-09-28 2018-03-13 佛山科学技术学院 A kind of intensive strong point fast selecting method of big data
CN107798338B (en) * 2017-09-28 2021-03-26 佛山科学技术学院 Method for quickly selecting big data dense support points
WO2022217748A1 (en) * 2021-04-14 2022-10-20 深圳计算科学研究院 Method and apparatus for measuring performance of support point of metric space, and related assembly
CN113282337A (en) * 2021-06-02 2021-08-20 深圳大学 Method and device for searching optimal complete division index of metric space and related components
WO2022252316A1 (en) * 2021-06-02 2022-12-08 深圳计算科学研究院 Method and apparatus for searching for optimal complete division index in metric space, and related component
CN113407799A (en) * 2021-06-22 2021-09-17 深圳大学 Performance measurement method and device for measuring space division boundary and related equipment
WO2022267094A1 (en) * 2021-06-22 2022-12-29 深圳计算科学研究院 Euclidean distance-based metric space index construction method and apparatus, and related device

Also Published As

Publication number Publication date
CN106528790B (en) 2019-08-16

Similar Documents

Publication Publication Date Title
US9454580B2 (en) Recommendation system with metric transformation
CN106528790A (en) Method and device for selecting support point in metric space
CN109033101B (en) Label recommendation method and device
CN107391636B (en) Top-m reverse nearest neighbor space keyword query method
CN104750798B (en) Recommendation method and device for application program
CN111445020B (en) Graph-based convolutional network training method, device and system
CN112307239B (en) Image retrieval method, device, medium and equipment
CN107291760A (en) Unsupervised feature selection approach, device
CN113656698B (en) Training method and device for interest feature extraction model and electronic equipment
CN104778284A (en) Spatial image inquiring method and system
CN105488176A (en) Data processing method and device
CN109471874A (en) Data analysis method, device and storage medium
CN111144109B (en) Text similarity determination method and device
CN108304404B (en) Data frequency estimation method based on improved Sketch structure
US20230035337A1 (en) Norm adjusted proximity graph for fast inner product retrieval
KR101116663B1 (en) Partitioning Method for High Dimensional Data
US10210281B2 (en) Method and system for obtaining knowledge point implicit relationship
WO2020147259A1 (en) User portait method and apparatus, readable storage medium, and terminal device
CN109635004A (en) A kind of object factory providing method, device and the equipment of database
KR102215299B1 (en) Error correction method and device and computer readable medium
CN111143638A (en) Visual layout method, system, storage medium and computer equipment applied to inter-group data relation analysis
CN112650869B (en) Image retrieval reordering method and device, electronic equipment and storage medium
CN104866818A (en) Method and device for searching pictures including human faces
CN113139834A (en) Information processing method, device, electronic equipment and storage medium
CN112685603A (en) Efficient retrieval of top-level similarity representations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant