CN117349518A - Method, device, computer equipment and storage medium for determining furthest adjacent candidate point - Google Patents

Method, device, computer equipment and storage medium for determining furthest adjacent candidate point Download PDF

Info

Publication number
CN117349518A
CN117349518A CN202311266943.4A CN202311266943A CN117349518A CN 117349518 A CN117349518 A CN 117349518A CN 202311266943 A CN202311266943 A CN 202311266943A CN 117349518 A CN117349518 A CN 117349518A
Authority
CN
China
Prior art keywords
point
neighbor
target
target query
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311266943.4A
Other languages
Chinese (zh)
Inventor
冯小康
王江
孙华锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202311266943.4A priority Critical patent/CN117349518A/en
Publication of CN117349518A publication Critical patent/CN117349518A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of big data, and discloses a method, a device, computer equipment and a storage medium for determining the farthest adjacent candidate point. The invention obtains target query points, target data sets, neighbor graphs, far-neighbor graphs and product quantization codes. And selecting the first starting point in the target data set as the nearest neighbor candidate point. A first set of neighbor points is extracted from the neighbor map and a distance between product quantization encodings of the target query point and the neighbor points is determined. And selecting a second group of neighbor points according to the distance between the product quantization codes of the target query point and the neighbor points. And selecting a first target candidate point according to the original distance between the neighbor point in the second group of neighbor points and the target query point. And when the original distance between the first target candidate point and the target query point is greater than or equal to the original distance between the first starting point and the target query point, determining the furthest adjacent candidate point according to the first target candidate point, the target query point and the far-adjacent graph. The invention can improve the efficiency of the furthest adjacent search.

Description

Method, device, computer equipment and storage medium for determining furthest adjacent candidate point
Technical Field
The invention relates to the technical field of big data, in particular to a method, a device, computer equipment and a storage medium for determining the furthest adjacent candidate point.
Background
In the big data age, each application program can record according to the daily behaviors of the user, analyze the preference of the user and recommend the information of interest to the user. In this process, data interested by the user is generally queried in the database through a corresponding algorithm, and the queried data is recommended to the user.
One algorithm is the furthest neighbor search algorithm, and the specific use process can be as follows:
generating a first feature vector according to the behavior record of the user A, randomly selecting the feature vector of any user in a database as a starting vector, and searching a second feature vector which is farthest from the first feature vector in the database by taking the starting vector as a starting point. Since the user B is the user that is least similar to the user a, the data that the user B dislikes may be the data that the user a likes, and thus the data that the user B corresponding to the second feature vector dislikes is recommended to the user a.
In the furthest neighbor search algorithm, the dimension of the feature vector is higher, so that the data in the database is more, and the furthest neighbor search efficiency is lower.
Disclosure of Invention
In view of the above, the present invention provides a method, apparatus, computer device and storage medium for determining the furthest neighbor candidate point, so as to solve the problem of low furthest neighbor searching efficiency.
In a first aspect, the present invention provides a method of determining a furthest adjacent candidate point, the method comprising:
acquiring a target query point, a target data set, a neighbor graph corresponding to the target data set, a far neighbor graph and a product quantization code of each data point in the target data set;
selecting a first starting point from the target data set as a nearest neighbor candidate point corresponding to the target query point, wherein the first starting point is any data point in the target data set;
extracting a first group of neighbor points corresponding to the first starting point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points;
selecting a second group of neighbor points closest to the target query point from the first group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points;
Selecting a first target candidate point with the minimum original distance from the second group of neighbor points according to the original distance between each neighbor point in the second group of neighbor points and the target query point;
and when the original distance between the first target candidate point and the target query point is larger than or equal to the original distance between the first starting point and the target query point, determining the furthest adjacent candidate point according to the first target candidate point, the target query point and the far-adjacent graph.
The method for determining the furthest adjacent candidate point has the following advantages:
because the number of data points in the target data set is large, the existing furthest adjacent algorithm is not mature enough, namely the accuracy of the neighbor point of each data point determined from the far adjacent graph is low, if one point is randomly selected from the far adjacent graph as a starting point to perform furthest adjacent search, the search is quite blind, more iterative operations can be performed to find the furthest adjacent point of the target query point, and the search efficiency is extremely low. Therefore, the method and the device can reduce the iteration times in the process of the furthest adjacent search by accurately selecting the starting point, and greatly improve the efficiency of the furthest adjacent search.
In an alternative embodiment, when the first original distance between the first target candidate point and the target query point is greater than or equal to the second original distance between the first starting point and the target query point, determining the farthest neighboring candidate point according to the first target candidate point, the target query point, and the distant graph includes:
determining a third group of neighbor points corresponding to the first target candidate point in the far-neighbor graph;
determining an original distance between each neighbor point in the third set of neighbor points and the target query point;
and selecting the neighbor point with the largest original distance from the third group of neighbor points as the farthest neighbor candidate point according to the original distance between each neighbor point in the third group of neighbor points and the target query point.
Specifically, since the original distance between two data points can represent the real distance between the two data points, and the nearest neighbor point of the target query point can be accurately determined through the neighbor map and the product quantization coding technology. Therefore, the most distant neighbor candidate point can be efficiently and accurately determined by replacing the target query point with the nearest neighbor point and performing the most distant neighbor search using the original distance.
In an alternative embodiment, when the original distance between the first target candidate point and the target query point is smaller than the original distance between the first starting point and the target query point, the method further comprises:
selecting a fourth group of neighbor points corresponding to the first target candidate point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points;
selecting a fifth group of neighbor points closest to the target query point from the fourth group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points;
determining an original distance between each neighbor point in the fifth set of neighbor points and the target query point;
selecting a second target candidate point with the minimum original distance from the fifth group of neighbor points according to the original distance between each neighbor point in the fifth group of neighbor points and the target query point;
so as to determine the furthest adjacent candidate point according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point and the far-adjacent graph;
Or,
and determining the furthest adjacent candidate point according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point, the far-adjacent graph and the neighbor graph.
Specifically, because the number of data points in the target data set is large, the probability of randomly selecting one data point in the target data set as the nearest neighbor point of the target query point is obviously extremely low, so that multiple iterative computations are required to find the accurate nearest neighbor point corresponding to the target query point. Further, by using the accurate nearest neighbor point as the starting point of the furthest neighbor search, the efficiency of the furthest neighbor search can be improved.
In an alternative embodiment, the method further comprises:
selecting a sixth group of neighbor points corresponding to the furthest neighbor candidate points from the neighbor graph;
determining a distance between the product quantization codes of the target query point and each neighbor point in the sixth set of neighbor points;
selecting a seventh group of neighbor points farthest from the target query point from the sixth group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the sixth group of neighbor points;
Determining an original distance between each neighbor point in the seventh set of neighbor points and the target query point;
selecting a third target candidate point with the largest original distance from the seventh group of neighbor points according to the original distance between each neighbor point in the seventh group of neighbor points and the target query point;
and determining a target farthest adjacent point corresponding to the target query point according to the original distance between the third target candidate point and the target query point and the original distance between the farthest adjacent candidate point and the target query point.
Specifically, because the existing furthest adjacent algorithm is not mature enough, the accurate furthest adjacent point of the target can be determined through iterative processing, and accurate recommendation is realized. Since the iterative process consumes very much operation resources, the operation speed can be increased and the waste of operation resources can be reduced by combining product quantization coding. The method and the device can accurately and efficiently determine the target furthest adjacent point.
In an alternative embodiment, the determining the target farthest neighboring point corresponding to the target query point according to the original distance between the third target candidate point and the target query point and the original distance between the farthest neighboring candidate point and the target query point includes:
And determining the furthest adjacent candidate point as the target furthest adjacent point when the original distance between the third target candidate point and the target query point is greater than or equal to the original distance between the furthest adjacent candidate point and the target query point.
Specifically, if the original distance between the third target candidate point and the target query point is greater than or equal to the original distance between the farthest adjacent candidate point and the target query point, the farthest adjacent candidate point is the data point farthest from the target query point, so that the farthest adjacent candidate point is strictly determined to be farthest from the target query point, the subsequent processing is not needed, and the resource can be saved.
In an alternative embodiment, the determining the target farthest neighboring point corresponding to the target query point according to the original distance between the third target candidate point and the target query point and the original distance between the farthest neighboring candidate point and the target query point includes:
when the original distance between the third target candidate point and the target query point is smaller than the original distance between the farthest adjacent candidate point and the target query point, selecting an eighth group of neighbor points corresponding to the third target candidate point from the neighbor map;
Determining a distance between the product quantization codes of the target query point and each neighbor point in the eighth set of neighbor points;
selecting a ninth group of neighbor points farthest from the target query point from the eighth group of neighbor points according to the distance between the product quantization codes of the third target candidate point and each neighbor point in the eighth group of neighbor points;
determining an original distance between each neighbor point in the ninth set of neighbor points and the target query point;
selecting a fourth target candidate point with the largest original distance from the ninth group of neighbor points according to the original distance between each neighbor point in the ninth group of neighbor points and the target query point;
and determining the furthest adjacent point of the target according to the original distance between the fourth target candidate point and the target query point and the original distance between the third target candidate point and the target query point.
Specifically, if the original distance between the third target candidate point and the target query point is smaller than the original distance between the farthest neighboring candidate point and the target query point, it is indicated that the third target candidate point is farther from the target query point than the farthest neighboring candidate point, and there may still be data points farther from the target query point in the neighboring points of the third target candidate point. Thus, by further processing, a more accurate target furthest neighbor can be determined.
In an alternative embodiment, the method further comprises:
and determining the furthest adjacent candidate point as a target furthest adjacent point corresponding to the target query point.
Specifically, since the starting point of the furthest adjacent search is accurately determined, the furthest adjacent search is performed according to the accurate starting point, and the determined furthest adjacent candidate point is also more accurate, so that the furthest adjacent candidate point can be determined as the target furthest adjacent point. Therefore, the furthest adjacent point of the target can be determined quickly, and the efficiency of the furthest adjacent search is improved.
In a second aspect, the present invention provides an apparatus for determining a furthest neighboring candidate point, the apparatus comprising:
the acquisition module is used for acquiring a target query point, a target data set, a neighbor graph corresponding to the target data set, a far neighbor graph and product quantization codes of each data point in the target data set;
the selecting module is used for selecting a first starting point from the target data set as a nearest neighbor candidate point corresponding to the target query point, wherein the first starting point is any data point in the target data set;
a determining module, configured to extract a first set of neighbor points corresponding to the first starting point from the neighbor map, and determine a distance between product quantization codes of the target query point and each neighbor point in the first set of neighbor points;
The selecting module is used for selecting a second group of neighbor points closest to the target query point from the first group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points; selecting a first target candidate point with the minimum original distance from the second group of neighbor points according to the original distance between each neighbor point in the second group of neighbor points and the target query point;
the determining module is configured to determine, when an original distance between the first target candidate point and the target query point is greater than or equal to an original distance between the first starting point and the target query point, a farthest neighboring candidate point according to the first target candidate point, the target query point and the far neighboring graph.
In a third aspect, the present invention provides a computer device comprising: the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions to execute the method for determining the furthest adjacent candidate point according to the first aspect or any implementation mode corresponding to the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of determining the furthest neighbor of the first aspect or any of its corresponding embodiments described above.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of determining a furthest neighbor candidate point according to an embodiment of the invention;
FIG. 2 is a schematic diagram of determining product quantization encoding according to an embodiment of the present invention;
FIG. 3 is a flow chart of another method of determining the furthest neighbor candidate point according to an embodiment of the invention;
FIG. 4 is a flow chart of a method of determining nearest neighbor candidate points according to an embodiment of the invention;
FIG. 5 is a flow chart of a method of determining a target furthest neighbor according to an embodiment of the invention;
FIG. 6 is a flow chart of another method of determining a target furthest neighbor according to an embodiment of the invention;
FIG. 7 is a block diagram of an apparatus for determining a furthest neighbor candidate point according to an embodiment of the invention;
fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms related to the embodiments of the present invention are explained below.
Furthest neighbor search: given a dataset comprising n d-dimensional feature vectorsLet "|||||" represent between two points euclidean distance (i.e. the original distance), for user-specified query points->The furthest neighbor search refers to searching for a feature vector which is furthest away from the query point from D >Can be expressed as +.>The maximum spanning tree (Maximum Spanning Tree), the diameter of the data set, the full-join clustering (Complete Linkage Clustering) and the nonlinear dimension reduction (Non-Linear Dimensionality Reduction) are commonly applied in the computational geometry field, and the big data field.
Nearest neighbor search: given a dataset comprising n d-dimensional feature vectorsLet "|||||" represent between two points euclidean distance (i.e. the original distance), for user-specified query points->The furthest neighbor search refers to finding the feature vector nearest to the query point from D +.>Can be expressed as +.>
Product quantization coding (Product quantization, PQ) technique: the PQ technique is a coding technique capable of coding an original high-dimensional vector into a compact code (compact code). The approximate distance calculated by the two PQ encodings corresponding to the two original high-dimensional vectors, respectively, is quite close to the original distance between them.
In the big data field, when recommending data for a target user, data similar to the behavior record of the target user is generally recommended for the target user according to the behavior record of the target user on a related platform. In making the recommendation, the furthest neighbor algorithm may be used. Through the furthest neighbor algorithm, the relevant platform can find other users which are least similar to the target user in the large database, and recommend data which are disliked by the other users to the target user.
The embodiment of the invention provides a method for determining the furthest adjacent point, which achieves the effect of improving the furthest adjacent searching efficiency by precisely selecting the furthest adjacent candidate point.
In accordance with an embodiment of the present invention, there is provided a method embodiment for determining the most distant candidate point, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
In this embodiment, a method for determining a furthest adjacent candidate point is provided, which may be used in a computer device, such as a desktop computer, a notebook computer, etc., fig. 1 is a flowchart of a method for determining a furthest adjacent candidate point according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:
step S101, obtaining a target query point, a target data set, a neighbor graph corresponding to the target data set, a product quantization code of each data point in the far neighbor graph and the target data set.
Wherein the target dataset may comprise a plurality of data points, each data point being a d-dimensional vector, d being an integer greater than 0. The target query point is also a d-dimensional vector.
Specifically, after the computer device acquires the target data set, the target data set may be input into a neighbor graph algorithm to obtain a neighbor graph corresponding to the target data set, and the target data set may be input into a far-neighbor graph algorithm to obtain a far-neighbor graph corresponding to the target data set.
The computer device may enumerate a plurality of combinations of subspace number parameters and cluster center number parameters based on the number of data points in the target data set, further, select a target combination based on the evaluation algorithm and the target data set, and create a corresponding product quantization encoder based on the target subspace number parameters (hereinafter denoted by M) and the target cluster center number parameters (hereinafter denoted by K) included in the target combination, which product quantization encoder may be expressed asWherein M < d. Further, the target data set is input into a product quantization encoder, and the product quantization encoding corresponding to each data point in the target data set is obtained.
The product quantization encoder process for all data points in the target data set may be:
first, the d-dimensional space in which the target data set is located is divided into M subspaces, each subspace including s=d/M dimensions. According to the divided M subspaces, each data point in the target data set is decomposed to obtain a plurality of components corresponding to each data point, namely, each data point can be regarded as M S-dimensional subvectors u i (x) I is more than or equal to 1 and less than or equal to M, and is formed by connecting:
for example, the target dataset includes a total of n d-dimensional vectors of x1, x2, x3 … … xn, and the corresponding plurality of components for each data point may be as shown in table 1.
TABLE 1
Wherein the plurality of components corresponding to data point x1 includes u1 (x 1), u2 (x 1), u3 (x 1) … … uM (x 1), and so on, each row except the first row represents each data point and the plurality of components corresponding to the data point. u1 (x 1), u1 (x 2), u1 (x 3) … … u1 (xn) form the sub-data set D1 corresponding to the first subspace, and so on, the plurality of components in each column except the first column form the sub-data set corresponding to each subspace.
Secondly, inputting each sub-data set into a clustering algorithm in the product quantization encoder to obtain a plurality of clustering centers corresponding to each data set. For each sub-dataset Di, the cluster center may be represented asAll sub-data sets correspond to M groups of sub-cluster centers +.>The clustering algorithm may be a K-Means algorithm.
All cluster centers corresponding to the target dataset may be as shown in table 2.
TABLE 2
D1 D2 D3 …… DM
C11 C21 C31 …… CM1
C12 C22 C32 …… CM2
C13 C23 C33 …… CM3
…… …… …… …… ……
C1k C2k C3k C1k CMk
Wherein C11, C12, C13 … … C1k are a plurality of cluster centers corresponding to the first sub-dataset, and the plurality of cluster centers form a first cluster center group. C21, C22, C23 … … C2k are a plurality of cluster centers corresponding to the second sub-dataset, the plurality of cluster centers forming a second cluster center group. And so on, obtaining M cluster center groups.
Third, in each subspace, determining the distance between the component of each data point in the subspace and a plurality of cluster centers corresponding to the subspace, determining the new identification of the cluster center nearest to the component, determining the identification information as the product quantization coding of the data point in the subspace, and the corresponding expression can be as follows:
I i =argmin 1≤j≤K ||u i (x),C ij ||……(2)
the identification information (Ii) of the cluster center may be a sequence number of the cluster center in the cluster center group to which it belongs. In this way, the product quantization code corresponding to each data point can be obtained, for example, the component of the data point x1 in the first subspace is u1 (x 1), the plurality of cluster centers corresponding to the first subspace include C11, C12 and C13 … … C1k, the cluster center closest to u1 (x 1) is C13, and the identification information of the cluster center C13 is 3. The product quantization code Q (x 1) for data point x1 is (I1, I2, I3 … … IM). Wherein I1=3, 1.ltoreq.I i ≤K。
The above process of determining the product quantization code for each data point may be as shown in fig. 2, where the product quantization encoder quantizes each data point into an integer string of M dimensions, i.e., the product quantization code, and the vector is represented by the cluster center indicated by the product quantization code.
Step S102, selecting a first starting point in the target data set as a nearest neighbor candidate point corresponding to the target query point.
Wherein the first starting point is any one data point in the target data set.
Specifically, the computer device may select any data point (i.e., the first starting point) in the target data set as the nearest neighbor candidate point corresponding to the target query point. The randomly selected data point may or may not be the closest point to the target query point. Therefore, after the first starting point is selected as the nearest neighbor candidate point of the target query point, searching can be performed in the neighbor graph with the first starting point as a reference to find the data point nearest to the target query point.
Step S103, extracting a first group of neighbor points corresponding to the first starting point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points.
In particular, the computer device may extract a first set of neighbor points of the first starting point from the neighbor map. Wherein the number of neighbor points in the first set of neighbor points may be a first preset threshold (g). Further, the product quantization code of each neighbor point in the first group of neighbor points may be determined according to the identification information of each neighbor point in the first group of neighbor points. For example, the identification information of the neighbor point is 1, and the product quantization coding of the neighbor point x1 can be determined as (I1, I2, I3 … … IM) according to the identification information. The computer device may then determine a plurality of cluster centers indicative of each neighbor point based on the product quantization encoding of each neighbor point. For example, the cluster center indicated by I1 is C11, the cluster center indicated by I2 is C25 … …, and so on, it can be determined that the plurality of cluster centers for indicating x1 include C11, C25, C31, … …, CM6.
The computer device may obtain a plurality of components corresponding to the target query point, for example, u1 (q), u2 (q), u3 (q) … … uM (q) in advance according to the target subspace quantity parameter. Further, the computer device may calculate a square of a distance between each of a plurality of cluster centers corresponding to subspaces to which the target query point corresponds, as may be shown in table 3.
TABLE 3 Table 3
The computer device may determine, in table 3, a square value of a distance between each component of the target query point and a cluster center of a subspace to which the component belongs, according to a plurality of cluster centers corresponding to each neighbor point in the first set of neighbor points, and further, sum the square values of the distances between each component and the cluster center corresponding to the component, so as to obtain a distance between the target query point and each neighbor point. For example, the cluster centers corresponding to x1 are C11, C25, C31, … …, and CM6, the computer device may determine a first distance square value in table 3 according to C11 and u1 (q), determine a second distance square value in table 3 according to C25 and u2 (q), determine a third distance square value in table 3 according to C31 and u3 (q), determine an mth distance square value in table 3 according to uM (q) and CM6, and further add the first distance square value, the second distance square value, and the third distance square value … … mth distance square value to obtain a distance square value between q and x1, where the distance square value may be used to indicate a distance between q and x 1.
The above calculation process can be expressed by the following expression:
wherein Q is a target query point, x is any data point in a target data set, Q (x) represents a product quantization code corresponding to x, u i (q) represents the component of q in the ith subspace of d, C i [I i ]Representing the cluster center closest to the component of the i-th subspace, which belongs to the i-th subspace. ||q, Q (x) || 2 May be referred to as asymmetric quantization distance (Asymmetric Quantizer Distance, AQD).
Step S104, selecting a second group of neighbor points nearest to the target query point from the first group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points.
Specifically, the computer device may quantize the distance between the codes according to the product of the target query point and each of the first set of neighbor points, sort each of the first set of neighbor points, and further select the second set of neighbor points according to the sorting of each of the first set of neighbor points. In the sorting process, the computer equipment can sort the first group of neighbor points according to the distance from small to largeAnd sorting each neighbor point, so that neighbor points with sorting less than or equal to a second preset threshold value can be selected as a second group of neighbor points. The number of neighbor points in the second set of neighbor points may be a second preset threshold, which may be τ=log 2 g。
Step S105, selecting a first target candidate point with the minimum original distance from the second group of neighbor points according to the original distance between each neighbor point in the second group of neighbor points and the target query point.
Specifically, the computer device may first determine an original distance between each neighbor point in the second set of neighbor points and the target query point, and may specifically calculate according to the following expression:
wherein,for any neighbor point in the second set of neighbor points,/for>Is the target query point.
Further, the computer device may sort each neighbor point in the second set of neighbor points from small to large according to the original distance, obtain a sort of each neighbor point in the second set of neighbor points, and determine the neighbor point sorted into the first as the first target candidate point.
And S106, when the original distance between the first target candidate point and the target query point is larger than or equal to the original distance between the first starting point and the target query point, determining the furthest adjacent candidate point according to the first target candidate point, the target query point and the far-adjacent graph.
Specifically, the computer device may determine whether the original distance between the first target candidate point and the target query point is greater than or equal to the original distance between the first starting point and the target query point, and if so, indicate that the first starting point selected in step S102 is the closest data point to the target query point, without further searching in the neighbor map.
After determining the nearest neighbor point of the target query point, the nearest neighbor point can be determined as a starting point of the furthest neighbor search, and the furthest neighbor point corresponding to the target query point is searched in the far neighbor graph.
The method for determining the farthest adjacent candidate point provided in this embodiment:
first, since the target query point is not a data point in the target dataset, the neighbor map and far-neighbor map are also built from the target dataset. And, the number of data points in the target data set is large, if the target query point is added to the target data set, and then the far-neighbor graph is created for the target data set added with the target query point, the efficiency of the furthest-neighbor search is low. And considering that the existing far-neighbor graph algorithm is not mature enough, if a corresponding far-neighbor graph is created each time when the furthest-neighbor search is performed on one query point, more resources are wasted, and the efficiency is very low. Therefore, the data point closest to the target query point can be found in the target data set, the target query point is replaced to carry out the furthest neighbor search in the far neighbor graph, and the efficiency of the furthest neighbor search can be improved.
Secondly, because the number of data points in the target data set is large, and the existing furthest adjacent algorithm is not mature enough, namely, the accuracy of the neighbor point of each data point determined from the far adjacent graph is low, if one point is randomly selected from the far adjacent graph as a starting point to perform furthest adjacent search, the search is quite blind, more iterative operations can be performed, the furthest adjacent point of the target query point can be found, and the search efficiency is extremely low. Therefore, the method and the device can reduce the iteration times in the process of the furthest adjacent search by accurately selecting the starting point, and greatly improve the efficiency of the furthest adjacent search.
Thirdly, in the related art, a plurality of clustering centers are obtained by clustering the target data set, and a target clustering center closest to the target query point is found out from the plurality of clustering centers. Further, the furthest neighbor search is performed by taking the target cluster center as a starting point. Assuming n data points, the n data points are clustered into K cluster centers, where n=100 tens of thousands and k1=1 tens of thousands. Because of the large data volume, the number of cluster centers must also be very large to more accurately represent the data points in the target data set. The clustering center and the target query point are d-dimensional data, and according to the formula (3), it is known that d-time distance calculation is required to calculate the distance between two d-dimensional data points, and if the clustering center closest to the target query point is to be found, K1 x d=128 ten thousand distance calculation is required.
In the scheme, n data points correspond to K2 x M cluster centers, and because the clustering is performed on each sub-data set, the K2 x M cluster centers can form K through permutation and combination M The d-dimensional vector, therefore, K2 may be much smaller than K1. For example, k2=256, m=8, i.e. k2×m cluster centers can be arranged and combined to K M =256 8 The d-dimensional vector is much larger than the number of data in the target dataset, n=100 tens of thousands. In calculating the distances between q and all cluster centers, the number of operations is k2×m×s=kd=256×128, which is far less than 128 ten thousand. From the above analysis, the efficiency of searching for the nearest neighbor point of the target query point can be greatly reduced by combining product quantization coding.
In summary, by combining the neighbor graph and the product quantization coding technology, the data point closest to the target query point can be determined efficiently, and further, the data point is used as the starting point for the furthest neighbor search, so that the efficiency of the furthest neighbor search can be greatly improved.
In this embodiment, a method for determining a furthest adjacent candidate point is provided, which may be used in a computer device, such as a desktop computer, a notebook computer, etc., and fig. 3 is a flowchart of a method for determining a furthest adjacent candidate point according to an embodiment of the present invention, as shown in fig. 3, where the flowchart includes the following steps:
step S301, obtaining a target query point, a target data set, a neighbor graph corresponding to the target data set, a product quantization code of each data point in the far neighbor graph and the target data set.
Step S302, selecting a first starting point in the target data set as a nearest neighbor candidate point corresponding to the target query point.
Step S303, extracting a first group of neighbor points corresponding to the first starting point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points.
Step S304, selecting a second group of neighbor points nearest to the target query point from the first group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points.
Step S305, selecting a first target candidate point with the minimum original distance from the second group of neighbor points according to the original distance between each neighbor point in the second group of neighbor points and the target query point.
The detailed process of step S305 of step S301 is referred to step S105 of step S101, and will not be described here.
Step S306, when the original distance between the first target candidate point and the target query point is greater than or equal to the original distance between the first starting point and the target query point, determining the furthest adjacent candidate point according to the first target candidate point, the target query point and the far-adjacent graph.
The step S306 includes:
step S3061, determining a third set of neighbor points corresponding to the first target candidate point in the far-neighbor graph.
Specifically, the computer device may determine, according to the identification information of the first target candidate point, a third set of neighbor points corresponding to the first target candidate point in the far-neighbor graph. Wherein the number of the third set of neighbor points may be a first preset threshold.
Step S3062, determining an original distance between each neighbor point in the third set of neighbor points and the target query point.
Specifically, the computer device may determine, according to equation (3), an original distance between each neighbor point in the third set of neighbor points and the target query point.
And step 3063, selecting the neighbor point with the largest original distance from the third group of neighbor points as the farthest neighbor candidate point according to the original distance between each neighbor point in the third group of neighbor points and the target query point.
Specifically, the computer device may sort each neighbor point in the third set of neighbor points according to the original distance from large to small, obtain the sorting of each neighbor point in the third set of neighbor points, and determine the neighbor point sorted into the first neighbor point as the farthest neighbor candidate point.
According to the method for determining the furthest adjacent candidate point, the original distance between two data points can represent the real distance between the two data points, and the nearest adjacent point of the target query point can be accurately determined through the neighbor graph and the product quantization coding technology. Therefore, the most distant neighbor candidate point can be efficiently and accurately determined by replacing the target query point with the nearest neighbor point and performing the most distant neighbor search using the original distance.
In this embodiment, a method for determining a nearest neighbor candidate point is provided, which may be used in a computer device, such as a desktop computer, a notebook computer, and the like. When the original distance between the first target candidate point and the target query point is smaller than the original distance between the first starting point and the target query point, it is described that the first target candidate point selected in step S102 is closer to the target query point than the first starting point, and it is necessary to determine whether there is a data point closest to the target candidate point in the neighboring points of the first target candidate point. Fig. 4 is a flowchart of a method of determining nearest neighbor candidate points according to an embodiment of the present invention, as shown in fig. 4, the flowchart including the steps of:
Step S401, selecting a fourth group of neighbor points corresponding to the first target candidate point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points.
Specifically, the computer device may extract a fourth set of neighbor points corresponding thereto in the neighbor map. The number of neighbor points in the fourth set of neighbor points may be a first preset threshold. Further, the product quantization code of each neighbor point in the fourth group of neighbor points can be determined according to the identification information of each neighbor point in the fourth group of neighbor points.
The computer device may determine, in table 4, a square value of a distance between each component of the target query point and a cluster center of the subspace to which the component belongs, according to a plurality of cluster centers corresponding to each neighbor point in the fourth set of neighbor points, and further, sum the square values of the distances between each component and the cluster center corresponding to the component, so as to obtain a distance between the target query point and each neighbor point in the fourth set of neighbor points.
Step S402, selecting a fifth group of neighbor points nearest to the target query point from the fourth group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points.
Specifically, the computer device may quantize the distance between the codes according to the product of the target query point and each of the fourth set of neighbor points, sort each of the fourth set of neighbor points, and further select a fifth set of neighbor points according to the sorting of each of the neighbor points. When the sorting is performed, the computer equipment can sort the neighbor points according to the distance from small to large to obtain the sorting of each neighbor point, so that the neighbor points with the sorting less than or equal to the second preset threshold value can be selected as the fifth group of neighbor points. The number of neighbor points in the fifth set of neighbor points may be a second preset threshold.
Step S403 determines an original distance between each neighbor point in the fifth set of neighbor points and the target query point.
Specifically, the computer device may calculate the original distance between each neighbor point in the fifth set of neighbor points and the target query point according to equation (3).
Step S404, selecting a second target candidate point with the minimum original distance from the fifth group of neighbor points according to the original distance between each neighbor point in the fifth group of neighbor points and the target query point.
Specifically, the computer device may sort each neighbor point in the fifth set of neighbor points according to the original distance from small to large, obtain the sorting of each neighbor point in the fifth set of neighbor points, and determine the neighbor point sorted into the first neighbor point as the second target candidate point.
The computer device may determine whether an original distance between the second target candidate point and the target query point is greater than or equal to an original distance between the first target candidate point and the target query point.
If so, the first target candidate point is the closest data point to the target query point, and the farthest adjacent candidate point can be determined subsequently according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point and the far-adjacent graph.
If not, the second target candidate point is closer to the target query point than the first target candidate point, and data points which are closer to the target query point still exist in the neighbor points of the second target candidate point, and the farthest neighbor candidate point can be determined according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point, the far neighbor map and the neighbor map.
In the method for determining the furthest adjacent candidate point provided by the embodiment, because the number of data points in the target data set is large, the probability of randomly selecting one data point in the target data set as the nearest adjacent point of the target query point is obviously extremely low, so that multiple iterative computations are required to find the accurate nearest adjacent point corresponding to the target query point. Further, by using the accurate nearest neighbor point as the starting point of the furthest neighbor search, the efficiency of the furthest neighbor search can be improved.
In this embodiment, a method for determining a target furthest adjacent point is provided, which may be used in a computer device, such as a desktop computer, a notebook computer, etc., and fig. 5 is a flowchart of a method for determining a target furthest adjacent point according to an embodiment of the present invention, as shown in fig. 5, where the flowchart includes the following steps:
step S501, obtaining a product quantization code of each data point in the target query point, the target data set, the neighbor graph corresponding to the target data set, the far neighbor graph, and the target data set.
Step S502, selecting a first starting point in the target data set as a nearest neighbor candidate point corresponding to the target query point.
Step S503, extracting a first set of neighbor points corresponding to the first starting point from the neighbor map, and determining a distance between the product quantization codes of the target query point and each neighbor point in the first set of neighbor points.
Step S504, selecting a second group of neighbor points nearest to the target query point from the first group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points.
In step S505, according to the original distance between each neighboring point in the second set of neighboring points and the target query point, a first target candidate point with the smallest original distance is selected from the second set of neighboring points.
In step S506, when the original distance between the first target candidate point and the target query point is greater than or equal to the original distance between the first starting point and the target query point, the farthest neighboring candidate point is determined according to the first target candidate point, the target query point and the far neighboring graph.
The detailed process of step S506 of step S501 is referred to step S106 of step S101, and will not be described here.
And S507, determining the furthest adjacent candidate point as a target furthest adjacent point corresponding to the target query point.
Specifically, the computing set device may directly determine the most distant candidate point as a target most distant point corresponding to the target query point, and further recommend the user according to data indicated by the target most distant point.
According to the method for determining the target furthest adjacent point, since the starting point of the furthest adjacent search is accurately determined, the furthest adjacent search is performed according to the accurate starting point, and the determined furthest adjacent candidate point is also accurate, so that the furthest adjacent candidate point can be determined as the target furthest adjacent point. Therefore, the furthest adjacent point of the target can be determined quickly, and the efficiency of the furthest adjacent search is improved.
In this embodiment, a method for determining a target furthest adjacent point is provided, which may be used in a computer device, such as a desktop computer, a notebook computer, etc., and fig. 6 is a flowchart of a method for determining a target furthest adjacent point according to an embodiment of the present invention, as shown in fig. 6, where the flowchart includes the following steps:
Step S601, selecting a sixth set of neighbor points corresponding to the most distant neighbor candidate points from the neighbor map.
In particular, the computer device may extract a sixth set of neighbor points of the furthest neighbor candidate point from the neighbor map. The number of neighbor points in the sixth set of neighbor points may be a first preset threshold.
Step S602, determining a distance between the product quantization codes of the target query point and each neighbor point in the sixth set of neighbor points.
Specifically, the computer device may determine, according to the identification information of each neighbor point in the sixth set of neighbor points, a product quantization code of each neighbor point in the sixth set of neighbor points. Further, according to a plurality of clustering centers corresponding to each neighbor point in the sixth set of neighbor points, a square value of a distance between each component of the target query point and the clustering center of the subspace to which the component belongs for each neighbor point may be determined in table 3, and further, the square value of the distance between each component and the clustering center corresponding to the component may be summed, so that a distance between the target query point and each neighbor point in the sixth set of neighbor points may be obtained.
Step S603, selecting a seventh group of neighbor points farthest from the target query point from the sixth group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the sixth group of neighbor points.
Specifically, the computer device may quantize the distance between the codes according to the product of the target query point and each of the neighbor points in the sixth set of neighbor points, sort each of the neighbor points in the sixth set of neighbor points, and further select a seventh set of neighbor points according to the sorting of each of the neighbor points. When the sorting is performed, the computer device can sort the neighbor points according to the distance from large to small to obtain the sorting of each neighbor point, so that the neighbor points with the sorting less than or equal to the second preset threshold value can be selected as the seventh group of neighbor points. The number of neighbor points in the seventh set of neighbor points may be a second preset threshold.
In step S604, an original distance between each neighbor point in the seventh set of neighbor points and the target query point is determined.
Specifically, the computer device may calculate the original distance between each neighbor point in the seventh set of neighbor points and the target query point according to equation (3).
Step S605, selecting a third target candidate point with the largest original distance from the seventh set of neighbor points according to the original distance between each neighbor point in the seventh set of neighbor points and the target query point.
Specifically, the computer device may sort each of the neighbor points in the seventh set of neighbor points according to the original distance from large to small, obtain a sort of each of the neighbor points in the seventh set of neighbor points, and determine the neighbor point sorted into the first neighbor point as the third target candidate point.
Step S606, determining the target furthest adjacent point corresponding to the target query point according to the original distance between the third target candidate point and the target query point and the original distance between the furthest adjacent candidate point and the target query point.
In particular, the computer device may determine whether the original distance between the third target candidate point and the target query point is greater than or equal to the original distance between the furthest neighboring candidate point and the target query point.
If so, the furthest neighbor candidate point is the data point furthest from the target query point, and the furthest neighbor candidate point can be determined to be the target furthest neighbor point.
If not, the third target candidate point is farther from the target query point than the farthest neighboring candidate point, and data points farther from the target query point still exist in the neighboring points of the third target candidate point, which can be processed as follows:
and step one, selecting an eighth group of neighbor points corresponding to the third target candidate point from the neighbor map.
And step two, determining the distance between the product quantization codes of the target query point and each neighbor point in the eighth group of neighbor points.
And thirdly, quantifying the distance between codes according to the product of the third target candidate point and each neighbor point in the eighth group of neighbor points, and selecting a ninth group of neighbor points farthest from the target query point from the eighth group of neighbor points.
And step four, determining the original distance between each neighbor point in the ninth group of neighbor points and the target query point.
And fifthly, selecting a fourth target candidate point with the largest original distance from the ninth group of neighbor points according to the original distance between each neighbor point in the ninth group of neighbor points and the target query point.
And step six, determining the furthest adjacent point of the target according to the original distance between the fourth target candidate point and the target query point and the original distance between the third target candidate point and the target query point.
See step S606 of step S601 for details of step one through step six.
In one possible implementation, when the original distance between the third target candidate point and the target query point is smaller than the original distance between the farthest neighboring candidate point and the target query point, the following processing may be performed:
and selecting an eighth group of neighbor points corresponding to the third target candidate point from the neighbor map. The distance between the product quantization encoding of the target query point and each of the eighth set of neighbor points is determined. And selecting a ninth group of neighbor points farthest from the target query point from the eighth group of neighbor points according to the distance between the product quantization codes of the third target candidate point and each neighbor point in the eighth group of neighbor points. And determining the distance between the target query point and the product quantization code of each neighbor point in the ninth group of neighbor points, and selecting a fourth target candidate point farthest from the target query point from the ninth group of neighbor points according to the distance between the target query point and the product quantization code of each neighbor point in the ninth group of neighbor points. And determining the furthest adjacent point of the target according to the original distance between the fourth target candidate point and the target query point and the original distance between the third target candidate point and the target query point.
In the above process, since more operations are needed to calculate the original distance between each neighboring point and the query point, the operation of selecting the distance between the neighboring point product quantization code and the query point can be greatly reduced, and the efficiency of the furthest neighboring search can be improved.
According to the method for determining the target furthest adjacent point, the existing furthest adjacent algorithm is not mature enough, so that the accurate target furthest adjacent point can be determined through iterative processing, and accurate recommendation is achieved. Since the iterative process consumes very much operation resources, the operation speed can be increased and the waste of operation resources can be reduced by combining product quantization coding.
In this embodiment, a device for determining the furthest neighboring candidate point is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides an apparatus for determining a furthest adjacent candidate point, as shown in fig. 7, including:
an obtaining module 701, configured to obtain a target query point, a target data set, a neighbor graph corresponding to the target data set, a product quantization code of each data point in the neighbor graph and the target data set;
a selecting module 702, configured to select a first starting point in the target data set as a nearest neighbor candidate point corresponding to the target query point, where the first starting point is any data point in the target data set;
a determining module 703, configured to extract a first set of neighbor points corresponding to the first starting point from the neighbor map, and determine a distance between the target query point and a product quantization code of each neighbor point in the first set of neighbor points;
a selecting module 702, configured to select a second set of neighbor points closest to the target query point from the first set of neighbor points according to a distance between the target query point and each of the first set of neighbor points; selecting a first target candidate point with the minimum original distance from the second group of neighbor points according to the original distance between each neighbor point in the second group of neighbor points and the target query point;
a determining module 703, configured to determine, when the original distance between the first target candidate point and the target query point is greater than or equal to the original distance between the first starting point and the target query point, a furthest neighboring candidate point according to the first target candidate point, the target query point and the far neighboring graph.
In an alternative embodiment, the determining module 703 is configured to:
determining a third group of neighbor points corresponding to the first target candidate point in the far-neighbor graph;
determining an original distance between each neighbor point in the third group of neighbor points and the target query point;
and selecting the neighbor point with the largest original distance from the third group of neighbor points as the farthest neighbor candidate point according to the original distance between each neighbor point in the third group of neighbor points and the target query point.
In an alternative embodiment, when the original distance between the first target candidate point and the target query point is smaller than the original distance between the first starting point and the target query point, the determining module 703 is further configured to:
selecting a fourth group of neighbor points corresponding to the first target candidate point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points;
selecting a fifth group of neighbor points nearest to the target query point from the fourth group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points;
determining an original distance between each neighbor point in the fifth group of neighbor points and the target query point;
Selecting a second target candidate point with the minimum original distance from the fifth group of neighbor points according to the original distance between each neighbor point in the fifth group of neighbor points and the target query point;
so as to determine the furthest adjacent candidate point according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point and the far-adjacent graph;
or,
and determining the furthest adjacent candidate point according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point, the far adjacent graph and the near adjacent graph.
In an optional embodiment, the selecting module 702 is further configured to select a sixth set of neighbor points corresponding to the most distant neighbor candidate points from the neighbor map;
a determining module 703, configured to determine a distance between the product quantization code of the target query point and each neighbor point in the sixth set of neighbor points;
the selecting module 702 is further configured to select, from the sixth set of neighbor points, a seventh set of neighbor points that are farthest from the target query point according to a distance between the target query point and each of the sixth set of neighbor points;
a determining module 703, configured to determine an original distance between each neighbor point in the seventh set of neighbor points and the target query point;
The selecting module 702 is further configured to select a third target candidate point with the largest original distance from the seventh set of neighbor points according to the original distance between each neighbor point in the seventh set of neighbor points and the target query point;
the determining module 703 is further configured to determine a target farthest neighboring point corresponding to the target query point according to the original distance between the third target candidate point and the target query point and the original distance between the farthest neighboring candidate point and the target query point.
In an alternative embodiment, the determining module 703 is configured to:
and when the original distance between the third target candidate point and the target query point is greater than or equal to the original distance between the farthest adjacent candidate point and the target query point, determining the farthest adjacent candidate point as the target farthest adjacent point.
In an alternative embodiment, the determining module 703 is configured to:
when the original distance between the third target candidate point and the target query point is smaller than the original distance between the farthest adjacent candidate point and the target query point, selecting an eighth group of neighbor points corresponding to the third target candidate point from the neighbor map;
determining the distance between the product quantization codes of the target query point and each neighbor point in the eighth set of neighbor points;
Selecting a ninth group of neighbor points farthest from the target query point from the eighth group of neighbor points according to the distance between the product quantization codes of the third target candidate point and each neighbor point in the eighth group of neighbor points;
determining an original distance between each neighbor point in the ninth set of neighbor points and the target query point;
selecting a fourth target candidate point with the largest original distance from the ninth group of neighbor points according to the original distance between each neighbor point in the ninth group of neighbor points and the target query point;
and determining the furthest adjacent point of the target according to the original distance between the fourth target candidate point and the target query point and the original distance between the third target candidate point and the target query point.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The means for determining the furthest adjacent candidate point in this embodiment is presented in the form of a functional unit, where a unit refers to an ASIC (Application Specific Integrated Circuit ) circuit, a processor and memory executing one or more software or fixed programs, and/or other devices that can provide the functionality described above.
The embodiment of the invention also provides computer equipment, which is provided with the device for determining the furthest adjacent candidate point shown in the figure 7.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 8, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 8.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 20 may be connected by a bus or other means, for example in fig. 8.
The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a touch pad, one or more mouse buttons, and the like. The output means 40 may comprise a display device or the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method of determining a furthest neighbor candidate point, the method comprising:
acquiring a target query point, a target data set, a neighbor graph corresponding to the target data set, a far neighbor graph and a product quantization code of each data point in the target data set;
selecting a first starting point from the target data set as a nearest neighbor candidate point corresponding to the target query point, wherein the first starting point is any data point in the target data set;
extracting a first group of neighbor points corresponding to the first starting point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points;
selecting a second group of neighbor points closest to the target query point from the first group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points;
Selecting a first target candidate point with the minimum original distance from the second group of neighbor points according to the original distance between each neighbor point in the second group of neighbor points and the target query point;
and when the original distance between the first target candidate point and the target query point is larger than or equal to the original distance between the first starting point and the target query point, determining the furthest adjacent candidate point according to the first target candidate point, the target query point and the far-adjacent graph.
2. The method of claim 1, wherein the determining the furthest neighboring candidate point from the first target candidate point, the target query point, and the far-neighboring graph when a first original distance between the first target candidate point and the target query point is greater than or equal to a second original distance between the first starting point and the target query point comprises:
determining a third group of neighbor points corresponding to the first target candidate point in the far-neighbor graph;
determining an original distance between each neighbor point in the third set of neighbor points and the target query point;
and selecting the neighbor point with the largest original distance from the third group of neighbor points as the farthest neighbor candidate point according to the original distance between each neighbor point in the third group of neighbor points and the target query point.
3. The method of claim 1, wherein when the original distance between the first target candidate point and the target query point is less than the original distance between the first starting point and the target query point, the method further comprises:
selecting a fourth group of neighbor points corresponding to the first target candidate point from the neighbor map, and determining the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points;
selecting a fifth group of neighbor points closest to the target query point from the fourth group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the fourth group of neighbor points;
determining an original distance between each neighbor point in the fifth set of neighbor points and the target query point;
selecting a second target candidate point with the minimum original distance from the fifth group of neighbor points according to the original distance between each neighbor point in the fifth group of neighbor points and the target query point;
so as to determine the furthest adjacent candidate point according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point and the far-adjacent graph;
Or,
and determining the furthest adjacent candidate point according to the original distance between the second target candidate point and the target query point, the original distance between the first starting point and the target query point, the far-adjacent graph and the neighbor graph.
4. A method according to any one of claims 1-3, wherein the method further comprises:
selecting a sixth group of neighbor points corresponding to the furthest neighbor candidate points from the neighbor graph;
determining a distance between the product quantization codes of the target query point and each neighbor point in the sixth set of neighbor points;
selecting a seventh group of neighbor points farthest from the target query point from the sixth group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the sixth group of neighbor points;
determining an original distance between each neighbor point in the seventh set of neighbor points and the target query point;
selecting a third target candidate point with the largest original distance from the seventh group of neighbor points according to the original distance between each neighbor point in the seventh group of neighbor points and the target query point;
and determining a target farthest adjacent point corresponding to the target query point according to the original distance between the third target candidate point and the target query point and the original distance between the farthest adjacent candidate point and the target query point.
5. The method of claim 4, wherein the determining the target furthest adjacent point corresponding to the target query point based on the original distance between the third target candidate point and the target query point and the original distance between the furthest adjacent candidate point and the target query point comprises:
and determining the furthest adjacent candidate point as the target furthest adjacent point when the original distance between the third target candidate point and the target query point is greater than or equal to the original distance between the furthest adjacent candidate point and the target query point.
6. The method of claim 4, wherein the determining the target furthest adjacent point corresponding to the target query point based on the original distance between the third target candidate point and the target query point and the original distance between the furthest adjacent candidate point and the target query point comprises:
when the original distance between the third target candidate point and the target query point is smaller than the original distance between the farthest adjacent candidate point and the target query point, selecting an eighth group of neighbor points corresponding to the third target candidate point from the neighbor map;
Determining a distance between the product quantization codes of the target query point and each neighbor point in the eighth set of neighbor points;
selecting a ninth group of neighbor points farthest from the target query point from the eighth group of neighbor points according to the distance between the product quantization codes of the third target candidate point and each neighbor point in the eighth group of neighbor points;
determining an original distance between each neighbor point in the ninth set of neighbor points and the target query point;
selecting a fourth target candidate point with the largest original distance from the ninth group of neighbor points according to the original distance between each neighbor point in the ninth group of neighbor points and the target query point;
and determining the furthest adjacent point of the target according to the original distance between the fourth target candidate point and the target query point and the original distance between the third target candidate point and the target query point.
7. A method according to any one of claims 1-3, wherein the method further comprises:
and determining the furthest adjacent candidate point as a target furthest adjacent point corresponding to the target query point.
8. An apparatus for determining a furthest neighbor candidate point, the apparatus comprising:
The acquisition module is used for acquiring a target query point, a target data set, a neighbor graph corresponding to the target data set, a far neighbor graph and product quantization codes of each data point in the target data set;
the selecting module is used for selecting a first starting point from the target data set as a nearest neighbor candidate point corresponding to the target query point, wherein the first starting point is any data point in the target data set;
a determining module, configured to extract a first set of neighbor points corresponding to the first starting point from the neighbor map, and determine a distance between product quantization codes of the target query point and each neighbor point in the first set of neighbor points;
the selecting module is used for selecting a second group of neighbor points closest to the target query point from the first group of neighbor points according to the distance between the product quantization codes of the target query point and each neighbor point in the first group of neighbor points; selecting a first target candidate point with the minimum original distance from the second group of neighbor points according to the original distance between each neighbor point in the second group of neighbor points and the target query point;
The determining module is configured to determine, when an original distance between the first target candidate point and the target query point is greater than or equal to an original distance between the first starting point and the target query point, a farthest neighboring candidate point according to the first target candidate point, the target query point and the far neighboring graph.
9. A computer device, comprising:
a memory and a processor in communication with each other, the memory having stored therein computer instructions which, upon execution, perform the method of determining a furthest neighbor candidate point of any of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of determining the furthest neighbor point of any of claims 1 to 7.
CN202311266943.4A 2023-09-27 2023-09-27 Method, device, computer equipment and storage medium for determining furthest adjacent candidate point Pending CN117349518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311266943.4A CN117349518A (en) 2023-09-27 2023-09-27 Method, device, computer equipment and storage medium for determining furthest adjacent candidate point

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311266943.4A CN117349518A (en) 2023-09-27 2023-09-27 Method, device, computer equipment and storage medium for determining furthest adjacent candidate point

Publications (1)

Publication Number Publication Date
CN117349518A true CN117349518A (en) 2024-01-05

Family

ID=89365968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311266943.4A Pending CN117349518A (en) 2023-09-27 2023-09-27 Method, device, computer equipment and storage medium for determining furthest adjacent candidate point

Country Status (1)

Country Link
CN (1) CN117349518A (en)

Similar Documents

Publication Publication Date Title
JP7343568B2 (en) Identifying and applying hyperparameters for machine learning
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN112528025A (en) Text clustering method, device and equipment based on density and storage medium
CN112085565B (en) Deep learning-based information recommendation method, device, equipment and storage medium
CN112395487B (en) Information recommendation method and device, computer readable storage medium and electronic equipment
CN110321437B (en) Corpus data processing method and device, electronic equipment and medium
CN110362798B (en) Method, apparatus, computer device and storage medium for judging information retrieval analysis
CN111985228A (en) Text keyword extraction method and device, computer equipment and storage medium
CN113315851A (en) Domain name detection method, device and storage medium
US10467276B2 (en) Systems and methods for merging electronic data collections
CN113326363B (en) Searching method and device, prediction model training method and device and electronic equipment
US11669530B2 (en) Information push method and apparatus, device, and storage medium
CN104462347A (en) Keyword classifying method and device
CN113821657A (en) Artificial intelligence-based image processing model training method and image processing method
CN107492036B (en) Insurance policy escrow system
CN115269998A (en) Information recommendation method and device, electronic equipment and storage medium
CN117349518A (en) Method, device, computer equipment and storage medium for determining furthest adjacent candidate point
CN112328653A (en) Data identification method and device, electronic equipment and storage medium
Beavers et al. Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN111881190B (en) Key data mining system based on customer portrait
CN115952459A (en) Error reporting identification method, device, equipment and storage medium
CN117009463A (en) Processing method and device for multi-path recall text similarity
CN117611282A (en) Method and device for training click rate prediction model and click rate prediction
CN117609640A (en) Stock data query method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination