CN114329135A - Method, device, equipment and storage medium for offline sorting of index points - Google Patents

Method, device, equipment and storage medium for offline sorting of index points Download PDF

Info

Publication number
CN114329135A
CN114329135A CN202111490990.8A CN202111490990A CN114329135A CN 114329135 A CN114329135 A CN 114329135A CN 202111490990 A CN202111490990 A CN 202111490990A CN 114329135 A CN114329135 A CN 114329135A
Authority
CN
China
Prior art keywords
index
index point
points
point
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111490990.8A
Other languages
Chinese (zh)
Inventor
赵铭鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111490990.8A priority Critical patent/CN114329135A/en
Publication of CN114329135A publication Critical patent/CN114329135A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides an offline index point sorting method, an offline index point sorting device, offline index point sorting equipment and a storage medium, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, vehicle-mounted and the like, and the method comprises the following steps: based on the target relevance among the index points in the search graph, the index points are sorted to obtain index point sorting results, then the index points are stored in the index memory according to the index point sorting results, and the relevant index points are stored in the index memory in a set, so that the relevant index points can be obtained for calculation every time the memory is accessed in the search process, the times of accessing the memory are reduced, and the search speed is improved. And calculating the association degree between the unsorted index points and the sorted index points in the sliding window in the sorting process, thereby avoiding calculating the association degree between the unsorted index points and all the sorted index points, reducing the process of repeatedly calculating the association degree and further improving the efficiency of sorting the index points.

Description

Method, device, equipment and storage medium for offline sorting of index points
Technical Field
The embodiment of the invention relates to the technical field of search, in particular to an offline sorting method, device, equipment and storage medium for index points.
Background
With the development of internet technology, information in the internet is increasing, and in order to facilitate a target object to quickly obtain required information from massive information, an information search function becomes an indispensable part of many applications.
In the related art, when searching, index points in a search library need to be traversed according to a certain rule to obtain a search result, where each index point represents a retrieved object, such as a retrieved article, a retrieved video, and the like.
However, in the search library of the related application, each index point is randomly stored in the index memory, so that it takes a long time to acquire the index point from the index memory in the search process, thereby resulting in low search efficiency.
Disclosure of Invention
The embodiment of the application provides an offline sorting method, device, equipment and storage medium of index points, which are used for improving the search efficiency.
In one aspect, an embodiment of the present application provides an offline sorting method for index points, where the method includes:
selecting an index point from a plurality of index points of a search graph as an initial sorted index point to be added to a sliding window;
determining target association degrees between each reserved index point and the sorted index points in the sliding window;
selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window;
iteratively executing the step of determining the target association degree between each reserved index point and the sorted index points in the sliding window until the index points are added to the sliding window;
and adding the index points to the sequence of the sliding window as an index point sorting result, and storing the index points in an index memory according to the index point sorting result.
In one aspect, an embodiment of the present application provides an offline index point sorting apparatus, where the apparatus includes:
the selection module is used for selecting one index point from a plurality of index points of the search graph as an initial sorted index point to be added to the sliding window;
the sorting module is used for determining the target association degree between each reserved index point and each sorted index point in the sliding window; selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window; iteratively executing the step of determining the target association degree between each reserved index point and the sorted index points in the sliding window until the index points are added to the sliding window;
and the storage module is used for adding the index points to the sequence of the sliding window as an index point sorting result and storing the index points in an index memory according to the index point sorting result.
Optionally, the sorting module is further configured to:
after each iteration process, if the number of sorted index points in the sliding window is greater than a preset threshold, removing the oldest added sorted index points from the sliding window, wherein the preset threshold is determined based on a central processor cache.
Optionally, the sorting module is specifically configured to:
sorting the index points according to the target relevance to obtain a target relevance sorting result;
and selecting a target index point from the index points as an ordered index point based on the target association ordering result, and adding the ordered index point to the sliding window.
Optionally, the sorting module is specifically configured to:
aiming at each index point, the following steps are respectively executed:
acquiring historical association degree between an index point and the sorted index points in the sliding window in the last iteration process;
determining a first sub-association degree between the index point and a target index point newly added to the sliding window in the last iteration process;
and determining a target relevance between the index point and the sorted index points in the sliding window based on the first sub relevance and the historical relevance.
Optionally, the sorting module is specifically configured to:
if the sorted index point which is added earliest is removed from the sliding window after the last iteration process, determining a second sub-association degree between the index point and the sorted index point which is added earliest;
determining a target degree of association between the index point and the sorted index points in the sliding window based on the first degree of sub-association, the second degree of sub-association, and the historical degree of association.
Optionally, the sorting module is specifically configured to:
aiming at each index point, the following steps are respectively executed:
determining each sorted index point in the sliding window and the association score of each sorted index point and one index point;
and determining a target association degree between the index point and the sorted index points in the sliding window based on the obtained association scores.
Optionally, the sorting module is specifically configured to:
obtaining each index point, and corresponding historical association sequencing results in the last iteration process;
and adjusting the historical association sequencing result based on the association degree of each target to obtain a target association sequencing result.
Optionally, the index points in the target association sorting result are arranged according to the sequence of the target association degree from large to small;
the sorting module is specifically configured to:
and taking the first index point in the target association sorting result as a target index point, and adding the target index point to the sliding window.
Optionally, a search module is further included;
the search module is specifically configured to:
obtaining a candidate index point of a search condition from the index memory and at least one other index point continuously stored with the candidate index point;
storing the at least one other index point in a central processor cache;
if the at least one other index point comprises the neighbor index point of the candidate index point, acquiring the neighbor index point of the candidate index point from the cache of the central processor;
determining a search result of the search condition based on the candidate index point and a neighbor index point of the candidate index point.
In one aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above-mentioned offline sorting method for index points when executing the program.
In one aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program executable by a computer device, and when the program runs on the computer device, the computer device is caused to execute the steps of the above-mentioned offline ordering method for index points.
In one aspect, the present application provides a computer program product, which includes a computer program stored on a computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer device, the computer device executes the steps of the above-mentioned method for offline sorting of index points.
In the embodiment of the application, the index points are sorted based on the target relevance among the index points to obtain the sorting result of the index points, and then the index points are stored in the index memory according to the sorting result of the index points, so that the associated index points are stored in the index memory in a set, and therefore, in the searching process, the associated index points can be obtained for calculation when the memory is accessed every time, the times of accessing the memory are reduced, and the searching speed is further improved. And secondly, calculating the association degree between the unsorted index points and the sorted index points in the sliding window in the sorting process, avoiding calculating the association degree between the unsorted index points and all the sorted index points, reducing the process of repeatedly calculating the association degree, and improving the efficiency of sorting the index points.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a search interface provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of a search results interface provided by an embodiment of the present application;
fig. 4 is a schematic flowchart of an offline sorting method for index points according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a method for obtaining index points from a memory according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of a method for obtaining index points from a memory according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 8 is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 9 is a schematic diagram of a common neighbor index point according to an embodiment of the present application;
fig. 10 is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 11 is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 12 is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 13 is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 14a is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 14b is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 14c is a schematic diagram of a processing result after an iterative process is ended according to an embodiment of the present application;
fig. 15 is a schematic diagram of a search scenario provided in an embodiment of the present application;
fig. 16 is a schematic flowchart of a method for constructing an HNSW graph according to an embodiment of the present disclosure;
fig. 17 is a schematic diagram of an HNSW map provided in an embodiment of the present application;
fig. 18 is a schematic flowchart of an offline sorting method for index points according to an embodiment of the present application;
fig. 19 is a schematic structural diagram of an offline index point sorting apparatus according to an embodiment of the present application;
fig. 20 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
For convenience of understanding, terms referred to in the embodiments of the present invention are explained below.
Navigable worlds: navigable Small World, NSW for short, is used for approximate nearest neighbor search.
Layered navigable worlds: the Hierarchical Navigable Small World, abbreviated as HNSW, is used for approximate nearest neighbor search.
A central processing unit: the Central Processing Unit, referred to as CPU for short, is the operation and control core of the computer system, and is the final execution Unit for information Processing and program operation.
Caching by a CPU: CPU Cache, components to reduce the average time required for a processor to access memory. When the processor sends out the memory access request, it will check whether there is the request data in the cache first. If yes (hit), the data is directly returned without accessing the memory; if not, the corresponding data in the memory is loaded into the cache before being returned to the processor.
The following is a description of the design concept of the embodiments of the present application.
In the related art, when searching, index points in a search library need to be traversed according to a certain rule to obtain a search result, where each index point represents a retrieved object, such as a retrieved article, a retrieved video, and the like.
However, in the search library of the related application, each index point is randomly stored in the index memory, so that it takes a long time to acquire the index point from the index memory in the search process, thereby resulting in low search efficiency.
Through analysis, it is found that, in the searching process, the CPU of the computer device needs to traverse the index points in the search library according to a certain rule. For each traversed candidate index point, the candidate index point and the neighbor index point of the candidate index point need to be obtained from the index memory for calculation. When the CPU obtains the candidate index points from the index memory, besides obtaining the candidate index points, a part of other index points are also obtained and stored in the CPU cache, wherein the part of the obtained index points are other index points continuously stored in the index memory with the candidate index points.
If all the index points are sorted according to the incidence relation among all the index points, and all the index points are stored in an index memory according to the sorting result. Therefore, when the CPU acquires the candidate index points from the index memory, a part of the index points which are taken more are the neighbor index points of the candidate index points with high probability, so that after the CPU acquires the candidate index points from the index memory, the CPU can directly acquire the neighbor index points of the candidate index points from the CPU cache, and then calculates the candidate index points and the neighbor index points of the candidate index points without acquiring the neighbor index points from the index memory, thereby effectively improving the searching speed.
In view of this, an embodiment of the present application provides an offline sorting method for the following index points, where the method includes:
and selecting one index point from a plurality of index points of the search graph as an initial sorted index point to be added to the sliding window. Target association degrees between each reserved index point and the sorted index points in the sliding window are determined. And selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window. And iteratively executing the step of determining the target association degree between each reserved index point and the sorted index points in the sliding window until a plurality of index points are added to the sliding window. And adding the index points to the sequence of the sliding window as an index point sorting result, and storing the index points in an index memory according to the index point sorting result.
In the embodiment of the application, the index points are sorted based on the target relevance among the index points to obtain the sorting result of the index points, and then the index points are stored in the index memory according to the sorting result of the index points, so that the associated index points are stored in the index memory in a set, and therefore, in the searching process, the associated index points can be obtained for calculation when the memory is accessed every time, the times of accessing the memory are reduced, and the searching speed is further improved. And secondly, calculating the association degree between the unsorted index points and the sorted index points in the sliding window in the sorting process, avoiding calculating the association degree between the unsorted index points and all the sorted index points, reducing the process of repeatedly calculating the association degree, and improving the efficiency of sorting the index points.
Referring to fig. 1, a system architecture diagram applicable to the embodiment of the present application is shown, where the system architecture includes at least terminal devices 101 and servers 102, the number of the terminal devices 101 may be one or more, and the number of the servers 102 may also be one or more, where the present application does not specifically limit the number of the terminal devices 101 and the servers 102.
The terminal device 101 has a target application with a search function in advance, where the target application is a client application, a web page version application, an applet application, or the like. The terminal device 101 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent appliance, an intelligent voice interaction device, an intelligent vehicle-mounted device, and the like, but is not limited thereto.
The server 102 is a background server of the target application, and the server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal device 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto.
The offline sorting method for the index points in the embodiment of the present application may be executed by the terminal device 101, may also be executed by the server 102, and may also be executed by the terminal device 101 and the server 102 interactively.
For example, the method for performing offline sorting of index points by using the server 102 in the embodiment of the present application includes the following steps:
the server 102 selects one index point from the plurality of index points of the search graph as an initial sorted index point to be added to the sliding window. Target association degrees between each reserved index point and the sorted index points in the sliding window are determined. And selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window. And iteratively executing the step of determining the target association degree between each reserved index point and the sorted index points in the sliding window until a plurality of index points are added to the sliding window. And adding the index points to the sequence of the sliding window as an index point sorting result, and storing the index points in an index memory according to the index point sorting result.
In practical application, the offline sorting method for index points in the embodiment of the application can be applied to search scenes such as article search, video search, commodity search and the like. The following search scenarios are exemplified by articles:
the server 102 constructs an HNSW graph offline, and sorts each index point in the HNSW graph by the method in the embodiment of the present application to obtain a sorting result of the index points, where each index point represents an article. And storing the index points in an index memory according to the index point sorting result.
The terminal device 101 displays a search interface of the instant messaging application, as shown in fig. 2, the search interface includes a search box 201, a search type 202, and a recommended search 203. The user enters the target entry "XX park" submission in the search box 201. The terminal device 101 sends a search request carrying a target entry to the server 102.
The server 102 searches k neighboring index points of the target entry from the HNSW graph, wherein in each search, a candidate index point and a plurality of other index points continuously stored with the current index node are obtained from the index memory (in the first search, any index point at the uppermost level in the HNSW graph is used as a candidate index point), and the plurality of other index points are stored in the CPU cache.
Since each index point in the HNSW graph is stored in the index memory according to the index point sorting result, a large probability of other index points continuously stored with the candidate index node includes the neighbor index point of the candidate index node.
And calculating the distance between the candidate index point and the target entry aiming at the candidate index point, simultaneously acquiring each neighbor index point of the candidate index point from the CPU cache, and calculating the distance between each neighbor index point and the target entry. And then selecting index points closest to the target entry from the candidate index points and the neighbor index points as new candidate index points, sequentially executing a subsequent searching process until a searching ending condition is met, and selecting k neighbor index points from each index node serving as the candidate index points.
Assuming that k is 2, the articles corresponding to the k neighboring index points are article a and article B, respectively. The server 102 sends the relevant information of the article a and the article B to the terminal device 101, and the terminal device 101 displays a search result interface of the instant messaging application, as shown in fig. 3, the search result interface displays the relevant information of the article a in a first area 301, and displays the relevant information of the article B in a second area 302.
Based on the system architecture diagram shown in fig. 1, an embodiment of the present application provides a flow of an index point offline sorting method, as shown in fig. 4, where the flow of the method is executed by a computer device, which may be the terminal device 101 and/or the server 102 shown in fig. 1, and the method includes the following steps:
step S401, selecting one index point from the plurality of index points of the search graph as an initial sorted index point, and adding the selected index point to the sliding window.
Specifically, the plurality of index points of the search graph are unordered index points, and the search graph may be an HNSW graph, an NSW graph, or the like. An index point may represent a video, or an article, or a good, etc. When an index point represents a video, the index point includes, but is not limited to, index data for the video category, video title, video details, and the like. When an index point represents an article, the index point includes, but is not limited to, index data for article categories, article titles, article details, and the like. When the index point represents an item, the index point includes, but is not limited to, index data for item category, item name, item price, and the like.
In the initial stage of the index point sorting, no sorted index point exists in the sliding window, so that one index point can be randomly selected from a plurality of index points to serve as an initial sorted index point to be added to the sliding window, wherein the size of the sliding window can be fixed or can be dynamically changed.
Step S402, determining the target association degree between each reserved index point and the sorted index points in the sliding window.
Step S403, selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window.
Specifically, the target relevance may also be referred to as target compactness. The greater the target relevance, the tighter the relationship between the index point and the sorted index points in the sliding window is represented; the smaller the target relevance, the more distant the relationship between the index point and the sorted index points in the sliding window is represented.
The above steps S402 and S403 are iteratively executed until the end when the plurality of index points in the search graph are added to the sliding window.
Step S404, adding the index points to the sequence of the sliding window as an index point sorting result, and storing the index points in an index memory according to the index point sorting result.
Optionally, after storing the plurality of index points in the index memory according to the index point sorting result, in the search process, the computer device obtains the candidate index point of the search condition and at least one other index point continuously stored with the candidate index point from the index memory, and then stores the at least one other index point in the central processor cache. And if at least one other index point contains the neighbor index point of the candidate index point, acquiring the neighbor index point of the candidate index point from the cache of the central processor. And if at least one other index point does not contain the neighbor index point of the candidate index point, acquiring the neighbor index point of the candidate index point from the index memory. And determining a search result of the search condition based on the candidate index point and the neighbor index point of the candidate index point.
Specifically, the search condition may be text, image, audio, or the like, and the number of the at least one other index point may be an upper limit value of index points that can be saved in the CPU cache. When the search graph is an HNSW graph or an NSW graph, repeated iterative search is needed in the search process, each iterative search is carried out, the candidate index points and all neighbor index points of the candidate index points are obtained in the above mode, then the distances between the candidate index points and the neighbor index points of the candidate index points and the search conditions are calculated, and the index points with the closest distances are used as new candidate index points. And (4) iteratively executing the subsequent searching process until the searching ending condition is met, and obtaining the searching result of the searching condition.
For example, the size of the CPU cache is set to 4 index points, and index point 7 is a neighbor index point of index point 1.
If the index points are not sorted, the index points are randomly stored in the index memory, as shown in fig. 5, where the storage locations of the index point 1 and the index point 7 in the memory are far away.
In the searching process, when the CPU of the computer device obtains the index point 1 from the index memory, the index point 2, the index point 3, the index point 4, and the index point 5 are additionally obtained and stored in the CPU cache. Since the CPU cache does not include the index point 7, if the CPU needs to obtain the neighbor index point of the index point 1 (i.e., the index point 7), it needs to obtain the neighbor index point from the index memory.
By adopting the offline sorting method for the index points in the embodiment of the application, after sorting the index points, the obtained sorting result of the index points is as follows: index point 5, index point 3, index point 1, index point 7, index point 6, index point 2, index point 4. And storing the index points in an index memory according to the sorting result of the index points, as shown in fig. 6, wherein the storage positions of the index point 1 and the index point 7 in the memory are adjacent to each other.
In the searching process, when the CPU obtains the index point 1 from the index memory, the index point 7, the index point 6, the index point 2, and the index point 4 are additionally stored in the CPU cache. Since the CPU cache includes the index point 7, if the CPU needs to acquire the neighbor index point of the index point 1 (i.e., the index point 7), the index point can be directly acquired from the CPU cache.
In the embodiment of the application, the index points are sorted based on the target relevance among the index points to obtain the sorting result of the index points, and then the index points are stored in the index memory according to the sorting result of the index points, so that the associated index points are stored in the index memory in a set, and therefore, in the searching process, the associated index points can be obtained for calculation when the memory is accessed every time, the times of accessing the memory are reduced, and the searching speed is further improved. And secondly, calculating the association degree between the unsorted index points and the sorted index points in the sliding window in the sorting process, avoiding calculating the association degree between the unsorted index points and all the sorted index points, reducing the process of repeatedly calculating the association degree, and improving the efficiency of sorting the index points.
Optionally, after each iteration process, if the number of sorted index points in the sliding window is greater than a preset threshold, the oldest added sorted index point is removed from the sliding window.
Specifically, the preset threshold is determined based on the CPU cache, and the preset threshold is an upper limit value of the index point that can be stored in the CPU cache. After the number of sorted index points in the sliding window reaches a preset threshold value, if an index point is added to the sliding window, the sliding window overflows, and at the moment, the sorted index point which is added earliest is moved out of the sliding window. The sorted index points that are moved out of the sliding window are saved in an array in the order in which they are moved out of the sliding window.
In the embodiment of the application, the size of the sliding window is controlled within a certain range, and then the unordered index points are sorted by determining the target association degree between the unordered index points and the sorted index points in the sliding window, without determining the target association degree between the unordered index points and all the sorted index points, so that the unordered index points are sorted, the repeated calculation process is avoided, the efficiency of sorting the index points is improved, and the waste of calculation resources is avoided.
Optionally, in each iteration process, sorting each index point according to each target relevance degree to obtain a target relevance sorting result. And based on the target association sorting result, selecting the target index points from the index points as sorted index points and adding the sorted index points to the sliding window.
Specifically, the index points may be sorted in the order from the large target relevance to the small target relevance, so as to obtain a target relevance sorting result. And then, taking the first index point in the target association sorting result as a target index point, and adding the target index point to the sliding window. And sorting the index points according to the sequence of the target relevance degrees from small to large to obtain a target relevance sorting result. And then, taking the last index point in the target association sorting result as a target index point and adding the target index point to the sliding window.
For example, the obtained plurality of unordered index points are set as index point 1, index point 2, index point 3, index point 4, index point 5, index point 6, and index point 7.
Referring to fig. 7, a schematic diagram of a processing result after the 5 th iteration process is completed, where the sorted index points include index point 5, index point 3, index point 1, index point 7, and index point 6, where the index point 5 and the index point 3 have slid out of the sliding window, the index point 1, the index point 7, and the index point 6 are located in the sliding window, and the unsorted index points include index point 2 and index point 4.
In the 6 th iteration process, determining the target association degree 1 between the index point 2 and the sorted index points (index point 1, index point 7 and index point 6) in the sliding window; a target degree of association 2 between index point 4 and the sorted index points (index point 1, index point 7, index point 6) within the sliding window is determined.
Sorting each unordered index point according to the sequence from the large target relevance degree to the small target relevance degree, obtaining target association sorting results of index points 2 and index points 4, adding the index points 2 to the sliding window, and obtaining a processing result after the 6 th iteration process is finished, specifically as shown in fig. 8, the sorted index points include index points 5, index points 3, index points 1, index points 7, index points 6, and index points 2, wherein the index points 5, the index points 3, and the index points 1 already slide out of the sliding window, the index points 7, the index points 6, and the index points 2 are located in the sliding window, and the unordered index points include the index points 4.
In the embodiment of the application, according to each target relevance, the unsorted index points are sorted to obtain a target relevance sorting result for representing the close degree relation between the unsorted index points and the sorted index points, one unsorted index point is selected from the unsorted index points as the sorted index point based on the target relevance sorting result and is added to the sliding window, the unsorted index points with high closeness with the sorted index points can be effectively and preferentially added to the sliding window, and therefore after the index points are stored in the index memory according to the sorting result, the index points continuously stored in the index memory are the index points with close relation, and the searching speed is further improved in the searching process.
Optionally, in each iteration process, the embodiment of the present application at least adopts the following implementation manner to respectively determine the target association degrees between each reserved index point and the sorted index points in the sliding window:
and selecting one index point from the plurality of index points as an initial sorted index point, adding the index point to the sliding window, and executing a first iteration process.
In the first iteration process, the embodiment of the present application determines the target association degree between each reserved index point and the initial sorted index point in the sliding window by at least adopting the following implementation manners:
specifically, for any two index points, based on a connection attribute between the two index points and the number of common neighbor index points, an association score between the two index points can be determined, where the connection attribute includes: the method includes the following steps of connectionless, directed graph single connection, undirected graph connection, and directed graph double connection, where a value corresponding to connectionless is 0, a value corresponding to directed graph single connection is 1, and undirected graph connection or directed graph double connection is 2.
The common neighbor index point refers to a neighbor index point with the same two index nodes, and the neighbor index point and a connecting edge between the two index nodes respectively point to the two index nodes.
For example, as shown in fig. 9, the index point 1 and the index point 2 are respectively connected to the index point 3, wherein a connection edge between the index point 3 and the index point 1 points to the index point 1 from the index point 3, a connection edge between the index point 2 and the index point 1 points to the index point 1 from the index point 2, and then the index point 3 is a common neighbor index point of the index point 1 and the index point 2.
Summing the values corresponding to the connection attributes between the two index points and the number of the common neighbor index points to obtain the association score between the two index points, which is specifically shown in the following formula (1):
S(u,v)=Ss(u,V)+Sn(u,v)……………(1)
wherein S (u, v) represents the association score between index point S and index point v, Ss(u, v) represents the number of common neighbor index points, S, between index point S and index point vnAnd (u, v) represents a value corresponding to the connection attribute between the index point s and the index point v.
In the first iteration process, the association scores between the index points and the initial sorted index points in the sliding window are obtained by adopting the formula (1). Then, determining a target association degree between the index point and the sorted index point in the sliding window by adopting a transformation function, wherein the transformation function is specifically shown as the following formula (2):
Figure BDA0003399348220000101
wherein the content of the first and second substances,
Figure BDA0003399348220000102
representing the permutation function, ω the size of the sliding window,
Figure BDA0003399348220000103
representing the degree of target association.
For the second iteration process and the iteration process after the second iteration process, the embodiments of the present application at least adopt the following implementation manners to determine the target association degree between each reserved index point and the sorted index points in the sliding window:
in the first embodiment, for each index point, the following steps are respectively performed:
and acquiring historical association degree between one index point and the sorted index points in the sliding window in the last iteration process. Then, a first sub-association degree between the index point and a target index point newly added to the sliding window in the last iteration process is determined. And determining the target association degree between the index point and the sorted index points in the sliding window based on the first sub-association degree and the historical association degree.
Specifically, the historical association degree corresponding to one index point refers to the target association degree of the index point in the last iteration process. If one index point is associated with the target index point newly added to the sliding window, the corresponding first sub-association degree is 1. If one index point is not associated with the target index point newly added to the sliding window, the corresponding first sub-association degree is 0. Of course, other values may be used to represent the first sub-relevance in the embodiments of the present application, and the present application is not limited to this.
And summing the historical relevance of the index point and the first sub-relevance to obtain the target relevance between the index point and the sorted index point in the sliding window.
For example, the size of the sliding window is set to be 3 index points, and the obtained plurality of unordered index points are index point 1, index point 2, index point 3, index point 4, index point 5, index point 6, and index point 7.
Referring to fig. 10, a schematic diagram of a processing result after the 3 rd iteration process is finished, where the sorted index points include index point 1, index point 3, and index point 6, and the target association sorting result of the remaining unsorted index points is: index point 5, index point 2, index point 4, and index point 7, where index point 6 is an index point newly added to the sliding window.
Before adding the index point 6 to the sliding window, the target association degree between the index point 2 and the sorted index points (index point 1 and index point 3) in the sliding window is F32(ii) a The target association degree between index point 4 and the sorted index points (index point 1 and index point 3) in the sliding window is F34(ii) a The target association degree between index point 5 and the sorted index points (index point 1 and index point 3) in the sliding window is F35(ii) a The target association between index point 6 and the sorted index points (index point 1 and index point 3) in the sliding window is F36(ii) a The target association degree between index point 7 and the sorted index points (index point 1 and index point 3) in the sliding window is F37
In executing the 4 th iteration process, the 3 rd iteration process can be used to obtain the targetAnd the standard association degrees are all used as historical association degrees. Aiming at an index point 5, acquiring historical association degree F between the index point 5 and sorted index points in a sliding window in the last iteration process35Then, a first sub-association T (5, 6) of index point 5 with index point 6 is determined. For history relevance F35Summing with the first sub-relevance T (5, 6) to obtain the corresponding target relevance F of the index point 5 in the 4 th iteration process45
Aiming at an index point 2, acquiring historical association degree F between the index point 2 and sorted index points in a sliding window in the last iteration process32Then, a first sub-association T (2, 6) of index point 2 with index point 6 is determined. For history relevance F32Summing with the first sub-relevance T (2, 6) to obtain the corresponding target relevance F of the index point 2 in the 4 th iteration process42
Aiming at an index point 4, acquiring historical association degree F between the index point 4 and sorted index points in a sliding window in the last iteration process34Then, a first sub-association T (4, 6) of index point 4 with index point 6 is determined. For history relevance F34Summing with the first sub-relevance T (4, 6) to obtain the corresponding target relevance F of the index point 4 in the 4 th iteration process44
Aiming at an index point 7, acquiring historical association degree F between the index point 7 and sorted index points in a sliding window in the last iteration process37Then, a first sub-association T (7, 6) of index point 7 with index point 6 is determined. For history relevance F37Summing with the first sub-relevance T (7, 6) to obtain the corresponding target relevance F of the index point 7 in the 4 th iteration process47
Sorting the unordered index points according to the sequence of the target relevance degrees from large to small, and obtaining a target relevance sorting result as follows: index point 2, index point 5, index point 4, and index point 7, then add index point 2 to the sliding window, slide index point 1 out of the sliding window, and obtain the index point ranking result after the 4 th iterative process, specifically as shown in fig. 11, the ranked index points include index point 1, index point 3, index point 6, and index point 2, wherein index point 1 has slid out of the sliding window, index point 3, index point 6, and index point 2 are located in the sliding window, and the unsorted index points include index point 5, index point 4, and index point 7.
In the embodiment of the application, only one new unsorted index point is added to the sliding window in the last iteration process, and the original sorted index point in the sliding window is not changed, so that the historical association degree corresponding to the unsorted index point in the last iteration process can be used, only the unsorted index point and the first sub-association degree between the unsorted index point newly added to the sliding window are needed to be calculated, and then the target association degree of the unsorted index point in the current iteration process can be obtained based on the historical association degree and the first sub-association degree, so that the calculation amount is reduced, and the efficiency of sorting the index points is improved.
In the second embodiment, for each index point, the following steps are respectively performed:
and acquiring historical association degree between one index point and the sorted index points in the sliding window in the last iteration process. Then, a first sub-association degree between the index point and a target index point newly added to the sliding window in the last iteration process is determined.
And if the earliest added sorted index point is removed from the sliding window after the last iteration process, determining a second subcorrelation degree between the index point and the earliest added sorted index point. And then determining the target relevance between the index point and the sorted index points in the sliding window based on the first sub relevance, the second sub relevance and the historical relevance.
Specifically, if there is an association between one index point and the sorted index point removed from the sliding window, the corresponding second sub-association degree is 1. If one index point is not associated with the sorted index point removed from the sliding window, the corresponding second sub-association degree is 0. Of course, other values may be used to represent the second sub-relevance in the embodiments of the present application, and the present application is not limited to this.
And summing the historical relevance of the index point and the first sub-relevance to obtain an intermediate relevance. And then, calculating the difference between the intermediate association degree and the second sub-association degree to obtain the target association degree between the index point and the sorted index point in the sliding window in the iteration process.
For example, see fig. 11, which is a schematic diagram of a processing result after the 4 th iteration process is completed, in the diagram, the sorted index points include index point 1, index point 3, index point 6, and index point 2, where index point 1 has slid out of the sliding window, index point 3, index point 6, and index point 2 are located in the sliding window, and the unsorted index points include index point 5, index point 4, and index point 7.
In the 5 th iteration, the target association degrees obtained in the 4 th iteration may be all used as historical association degrees. Aiming at an index point 5, acquiring historical association degree F between the index point 5 and sorted index points in a sliding window in the last iteration process45Then, a first sub-association T (5, 2) of index point 5 with index point 2 is determined, and a second sub-association R (5, 1) of index point 5 with index point 1 is determined. Then, the historical relevance F45Summing with the first sub-relevance T (5, 2), subtracting the second sub-relevance R (5, 1) to obtain the corresponding target relevance F of the index point 5 in the 5 th iteration process55
Aiming at an index point 4, acquiring historical association degree F between the index point 4 and sorted index points in a sliding window in the last iteration process44Then, a first sub-association T (4, 2) of index point 4 with index point 2 and a second sub-association R (4, 1) of index point 4 with index point 1 are determined. Then, the historical relevance F44Summing with the first sub-relevance T (4, 2), subtracting the second sub-relevance R (4, 1) to obtain the corresponding target relevance F of the index point 4 in the 5 th iteration process54
Aiming at an index point 7, acquiring historical association degree F between the index point 7 and sorted index points in a sliding window in the last iteration process47Then, a first sub-association T (7, 2) of index point 7 with index point 2 and a second sub-association R (7, 1) of index point 7 with index point 1 are determined. Then, the historical relevance F47Summed with the first sub-correlation T (7, 2) and subtracted by the second sub-correlationRelevance R (7, 1), and obtaining the corresponding target relevance F of the index point 7 in the 5 th iteration process57
Sorting the unordered index points according to the sequence of the target relevance degrees from large to small, and obtaining a target relevance sorting result as follows: index point 4, index point 5, and index point 7, then add index point 4 to the sliding window, and simultaneously slide index point 3 out of the sliding window, to obtain the index point sorting result after the 5 th iteration process. Specifically, as shown in fig. 12, the sorted index points include index point 1, index point 3, index point 6, index point 2, and index point 4, where the index points 1 and 3 have slid out of the sliding window, the index points 6, 2, and 4 are located in the sliding window, and the unsorted index points include index point 5 and index point 7.
In the embodiment of the application, because a new unsorted index point is added to the sliding window in the last iteration process, and a sorted index point is removed from the sliding window at the same time, and the original sorted index point in the sliding window is not changed, the historical association degree corresponding to the unsorted index point in the last iteration process can be used, then the unsorted index point, the first sub-association degree between the unsorted index point and the unsorted index point newly added to the sliding window, and the first sub-association degree between the unsorted index point and the sorted index point removed from the sliding window are calculated, and then the target association degree of the unsorted index point in the current iteration process can be obtained based on the historical association degree, the first sub-association degree and the second historical association degree, so that the calculation amount is reduced, and the efficiency of sorting the index points is improved.
In the third embodiment, for each index point, the following steps are respectively performed:
determining each sorted index point in the sliding window and the association score of each sorted index point with one index point, and then determining the target association degree between the index point and the sorted index points in the sliding window based on each obtained association score.
Specifically, in each iteration process, for each unsorted index point, based on the connection attribute and the number of common neighbor index points between the unsorted index point and each sorted index point in the sliding window, an association score between the unsorted index point and each sorted index point is determined, specifically as shown in the above formula (1). And then, determining a target association degree between the unsorted index point and the sorted index point in the sliding window based on the obtained association score by adopting a conversion function shown in the formula (2).
For example, see fig. 11, which is a schematic diagram of a processing result after the 4 th iteration process is completed, in the diagram, the sorted index points include index point 1, index point 3, index point 6, and index point 2, where index point 1 has slid out of the sliding window, index point 3, index point 6, and index point 2 are located in the sliding window, and the unsorted index points include index point 5, index point 4, and index point 7.
In the 5 th iteration, the association scores S (5, 3) of the index points 5 and 3, the association scores S (5, 6) of the index points 5 and 6, and the association scores S (5, 2) of the index points 5 and 2 are respectively determined for the index points 5 by using the above formula (1). Then, the formula (2) and the obtained association scores are adopted to determine the corresponding target association degree F of the index point 5 in the 5 th iteration process55
Aiming at the index point 4, respectively determining the associated scores S (4, 3) of the index point 4 and the index point 3, the associated scores S (4, 6) of the index point 4 and the index point 6 and the associated scores S (4, 2) of the index point 4 and the index point 2 by adopting the formula (1), and then determining the corresponding target association degree F of the index point 4 in the 5 th iteration process by adopting the formula (2) and the obtained associated scores54
For the index point 7, the association scores S (7, 3) of the index point 7 and the index point 3, the association scores S (7, 6) of the index point 7 and the index point 6, and the association scores S (7, 2) of the index point 7 and the index point 2 are respectively determined by adopting the formula (1), and then the corresponding target association degree F of the index point 7 in the 5 th iteration process is determined by adopting the formula (2) and the obtained association scores57
Sorting the unordered index points according to the sequence of the target relevance degrees from large to small, and obtaining a target relevance sorting result as follows: index point 4, index point 5, and index point 7, then add index point 4 to the sliding window, and simultaneously slide index point 3 out of the sliding window, to obtain the index point sorting result after the 5 th iteration process. Specifically, as shown in fig. 12, the sorted index points include index point 1, index point 3, index point 6, index point 2, and index point 4, where the index points 1 and 3 have slid out of the sliding window, the index points 6, 2, and 4 are located in the sliding window, and the unsorted index points include index point 5 and index point 7.
In the embodiment of the application, in each iteration process, the association score between the unordered index point and the sorted index point is determined based on the connection attribute between the unordered index point and the sorted index point in the sliding window and the number of common neighbor index points, and then the target association degree between the unordered index point and the sorted index point in the sliding window is determined based on the association score and the conversion function, so that the accuracy of the target association degree is ensured.
Optionally, in each iteration process, only one index point is newly added to the sliding window, one sorted index point is removed when the sliding window overflows, and other sorted index points in the sliding window are not changed, so that the change of the target associated sorting result of each unsorted index point in the iteration process is not large compared with the previous iteration process. Therefore, the historical associated sorting result of each unordered index point in the last iteration process can be adjusted to obtain the target associated sorting result in the current iteration process.
In view of this, in the embodiment of the present application, each index point is obtained, and a sorting result is associated with the history corresponding to each index point in the last iteration process. And then, based on the relevance of each target, adjusting the historical relevance ranking result to obtain a target relevance ranking result.
Specifically, the target index points newly added to the sliding window in the last iteration process are removed from the target association sorting result obtained in the last iteration process, so as to obtain a historical association sorting result.
In the iteration process, for each index point (hereinafter referred to as a target unsorted index point), when a first sub-association degree between the target unsorted index point and a target index point newly added to the sliding window is 1, adding 1 to the target association degree corresponding to the target unsorted index point, then judging whether a first replacement index point exists in the historical association sorting result, if so, replacing the positions of the target unsorted index point and the first replacement index point in the historical association sorting result, finishing one sorting, otherwise, not adjusting the historical association sorting result.
When the sorting rule corresponding to the historical association sorting result is as follows: when the target relevance is sorted from high to low, the first replacement index point needs to satisfy the following conditions:
the target relevance of the first replacement index point is less than the target relevance of the target unsorted index point, and the target relevance of the unsorted index point which is arranged one bit before the first replacement index point is greater than or equal to the target relevance of the target unsorted index point.
And when the first sub-association degree between the target unordered index point and the target index point newly added to the sliding window is 0, not adjusting the historical association sorting result. After sorting is performed on each unordered index point, the sorting complexity is o (mlogn), where M denotes the number of edges associated with the unordered index points newly added to the sliding window, and N denotes the number of each unordered index point.
Similarly, when the second sub-association degree between the target unsorted index point and the sorted index point sliding out of the sliding window in the last iteration process is 1, subtracting 1 from the target association degree corresponding to the target unsorted index point, then judging whether a second replacement index point exists in the historical association sorting result, if so, replacing the positions of the target unsorted index point and the second replacement index point in the historical association sorting result, finishing one sorting, and if not, not adjusting the historical association sorting result.
When the sorting rule corresponding to the historical association sorting result is as follows: when the target relevance is sorted from high to low, the second replacement index point needs to satisfy the following conditions:
the target relevance of the second replacement index point is greater than the target relevance of the target unsorted index point, and the target relevance of the unsorted index point which is arranged one bit behind the second replacement index point is less than or equal to the target relevance of the target unsorted index point.
And when the second sub-association degree between the target unsorted index point and the sorted index point sliding out of the sliding window in the last iteration process is 0, not adjusting the historical association sorting result. After sorting is performed on each unordered index point, the sorting complexity is o (llogn), where L denotes the number of edges associated with the sorted index points sliding out of the sliding window, and N denotes the number of each unordered index point.
For example, see fig. 13, which is a schematic diagram of a processing result after the 4 th iteration process is completed, in the diagram, the sorted index points include index point 1, index point 3, index point 6, and index point 2, where index point 1 has slid out of the sliding window; the index point 3, the index point 6 and the index point 2 are positioned in the sliding window; unordered index points include: index point 5, index point 4, index point 7. In the 4 th iteration process, the target relevance F corresponding to the index point 5455, the index point 4 corresponds to the target relevance F 445, the index point 7 corresponds to the target relevance F47=4。
In the 4 th iteration process, newly adding the index point 2 to the sliding window, and in the 5 th iteration process, performing the following operations:
aiming at an index point 5, acquiring historical association degree F between the index point 5 and sorted index points in a sliding window in the last iteration process 455. Since the first sub-association degree of the index point 5 and the index point 2 is T (5, 2) ═ 0, after the index point 2 is newly added to the sliding window, the historical association sorting result is not adjusted (the historical association sorting result is: the index point 5, the index point 4 and the index point 7), and in the 5 th iteration process of the index point 5, the corresponding target association degree F is obtained55=5。
Aiming at an index point 4, acquiring historical association degree F between the index point 4 and sorted index points in a sliding window in the last iteration process 445. Since the first sub-association degree of index point 4 with index point 2 is T (4, 2) ═ 1, index point 4 is atIn the 5 th iteration process, the corresponding target relevance F 545+ 1-6. At this time, as shown in fig. 14a, the target association degrees corresponding to the index point 5, the index point 4, and the index point 7 are respectively: 5. 6 and 4.
If the first replacement index point in the historical association sorting result is obtained by binary search and is the index point 5, the positions of the replacement index point 5 and the index point 4 in the historical association sorting result are replaced, the replacement result is shown in fig. 14b, and the updated historical association sorting result after replacement is: index point 4, index point 5, index point 7, the respective corresponding target association degrees are: 6. 5 and 4.
Aiming at an index point 7, acquiring historical association degree F between the index point 7 and sorted index points in a sliding window in the last iteration process 474. Since the first sub-association degree of the index point 7 and the index point 2 is T (7, 2) ═ 0, after the index point 2 is newly added to the sliding window, the historical association sorting result is not adjusted, and the corresponding target association degree F of the index point 7 in the 5 th iteration process57=4。
In addition, during the 4 th iteration, the index point 1 is also removed from the sliding window, and during the 5 th iteration, the following operations are performed:
for index point 4, since the second sub-association degree of index point 4 and index point 1 is R (4, 1) ═ 0, after index point 1 is removed from the sliding window, the historical association sorting result is not adjusted, and the target association degree F is not updated54
For index point 5, since the second sub-association degree R (5, 1) of index point 5 and index point 1 is 1, in the 5 th iteration process, the target association degree F of index point 5 is set to 155 Subtracting 1 to obtain the updated target relevance F55At this time, as shown in fig. 14c, the target association degrees corresponding to the index point 4, the index point 5, and the index point 7 are respectively as follows: 6. 4 and 4. the step of mixing. And searching for a second replacement index point in the historical association sorting result which is not obtained through the dichotomy, and not adjusting the historical association sorting result.
For index point 7, since the second sub-association degree of index point 7 with index point 1 is R (7, 1) ═ 0, the index point 7 is associated with the index point 1 in a certain degreeTherefore, after the index point 1 is removed from the sliding window, the historical association sorting result is not adjusted, and the target association degree F is not updated57
After the adjustment is finished, obtaining a target association sequencing result as follows: index point 4, index point 5, and index point 7, then add index point 4 to the sliding window, and simultaneously slide index point 3 out of the sliding window, to obtain the index point sorting result after the 5 th iteration process. Specifically, as shown in fig. 12, the sorted index points include index point 1, index point 3, index point 6, index point 2, and index point 4, where the index points 1 and 3 have slid out of the sliding window, the index points 6, 2, and 4 are located in the sliding window, and the unsorted index points include index point 5 and index point 7.
In the embodiment of the application, in each iteration process, when an unsorted index point is newly added to a sliding window or the sorted index point is removed from the sliding window, the target relevance degree of each associated unsorted index point is correspondingly updated once, and after the target relevance degree is updated once, a historical associated sorting result is adjusted in a once replacement mode.
It should be noted that, in the embodiment of the present application, the method for adjusting the historical association sorting result to obtain the target association sorting result is not limited to the above-described one, and it may also be configured to determine whether the historical association sorting result needs to be adjusted after determining the target association degree of the unsorted index point based on the first sub-association degree of the unsorted index point with the newly added unsorted index point and the second sub-association degree of the removed sorted index point, and if so, complete sorting at the position in the historical association sorting result by replacing the unsorted index point with another unsorted index point. The present application is not particularly limited thereto.
In order to better explain the embodiment of the present application, an offline ordering method for index points provided by the embodiment of the present application is described below with reference to a search scenario. The search scenario includes an offline data processing phase and an online search phase, which are executed by a computer device, which may be the terminal device 101 and/or the server 102 shown in fig. 1. As shown in fig. 15, the offline data processing stage includes constructing HNSW graph and saving index points, and the online search stage includes acquiring search conditions, analyzing search conditions, and searching online. The following is for each stage deployment:
first, a process of constructing a HNSW graph is described, which includes the following steps: initializing the HNSW graph, and then adding the index points to the initialized HNSW graph in an iterative manner until an iteration ending condition is met to obtain the constructed HNSW graph, wherein the iteration ending condition can be that all the index points are added to the HNSW graph. Each iteration process is shown in fig. 16, and includes the following steps:
step S1601, an index point q to be added is obtained.
In step S1602, a target level i into which the index point q falls is determined by a random function.
The HNSW graph comprises a plurality of hierarchies, each hierarchy corresponds to a hierarchy number, the corresponding hierarchy numbers of the plurality of hierarchies are decreased from top to bottom, and index points contained in each hierarchy are increased sequentially. The target level i is a level in the HNSW graph, i is more than or equal to 0 and less than or equal to k, wherein 0 represents the level number corresponding to the lowest level in the HNSW graph, and k represents the level number corresponding to the highest level in the HNSW graph.
In step S1603, it is determined whether i is less than k, if so, step S1604 is performed, otherwise, step S1608 is performed.
In step S1604, the current processing level Kc is set to k.
Step S1605, find the index point x closest to the index point q in the current processing hierarchy.
In step S1606, the next hierarchy is entered via the index point x, and the current processing hierarchy Kc is set to Kc-1.
Step 1607, determine whether Kc is equal to i, if yes, execute step 1608, otherwise execute step 1605.
Step S1608, find the neighbor index point set of the obtained index point q in the current processing level Kc.
Step S1609, the index point q is inserted into the current processing level Kc, and the index point q is connected with each neighbor index point in the neighbor index point set.
In step S1610, the current processing level Kc — 1 is set.
In step S1611, it is determined whether Kc is equal to 0, if yes, step S1612 is performed, otherwise step S1608 is performed.
In step S1612, the addition of the index point q is completed.
In one example, the HNSW graph obtained by the above method is shown in fig. 17, and the HNSW graph includes 3 levels, wherein level 2 is the top level, level 1 is the middle level, and level 0 is the bottom level. Wherein, the level 0 includes 8 index points, which are respectively an index point 1, an index point 2, an index point 3, an index point 4, an index point 5, an index point 6, an index point 7, and an index point 8. Tier 1 includes index points 1, 2, 6, 8, 4 index points less than tier 2. Level 0 includes index points 1, 6, 2 index points less than level 1. In the searching process, aiming at the searching conditions, the indexing point 6 in the hierarchy 2 is taken as a starting point, and the hierarchy searching is carried out on each hierarchy from top to bottom until the indexing point 3 in the hierarchy 0 is reached, so that the searching result is obtained. It should be understood that fig. 17 is only an exemplary illustration, and in practical applications, the number of index points included in each layer and the connection relationship between the index points are not limited thereto.
In this embodiment of the present application, all index points of the lowest hierarchy in the HNSW graph may be sorted to obtain an index point sorting result, and then all index points of the lowest hierarchy in the HNSW graph are stored in the index memory according to the index point sorting result, as shown in fig. 18, the method includes the following steps:
step S1801, acquiring all index points of the lowest hierarchy in the HNSW graph as unsorted index points.
Step S1802, select one unsorted index point from the plurality of unsorted index points as an initial sorted index point to add to the sliding window.
Step S1803, determining target association degrees between each of the reserved unsorted index points and the sorted index points in the sliding window.
Step S1804, sorting the unsorted index points according to the target relevance, and obtaining a target relevance sorting result.
Step S1805, based on the target association sorting result, selecting one unsorted index point from the unsorted index points as a sorted index point, and adding the sorted index point to the sliding window.
Step S1806, determine whether the number of the reserved unordered index points is 0, if yes, execute step S1809, otherwise execute step S1807.
Step S1807, determining whether the number of sorted index points in the sliding window is greater than a preset threshold, if so, executing step S1808, otherwise, executing step S1803.
In step S1808, the oldest added sorted index point is removed from the sliding window, and step S1803 is performed.
Step S1809, obtaining an index point sorting result, and storing a plurality of unordered index points in the index memory according to the index point sorting result.
In the search process, assuming that each neighbor index point represents an article, the lowest level in the HNSW graph includes n index points. And performing layer search on each hierarchy from the top hierarchy of the HNSW graph to the bottom hierarchy from the top hierarchy of the HNSW graph, wherein k adjacent index points of the search condition are searched from the n index points in the bottom hierarchy, and articles corresponding to the k adjacent index points are used as search results of the search condition.
In each searching process, the candidate index point and a plurality of other index points continuously stored with the current index node are obtained from the index memory (in the first searching, any index point at the top layer in the HNSW graph is used as the candidate index point), and the other index points are stored in the CPU cache.
And calculating the distance between the candidate index point and the search condition according to the candidate index point, and meanwhile, judging whether the CPU cache contains the neighbor index point of the candidate index point, if so, acquiring each neighbor index point of the candidate index point from the CPU cache, otherwise, acquiring each neighbor index point of the candidate index point from the index memory. And then calculating the distance between each neighbor index point and the search condition. And then selecting the index point closest to the search condition from the candidate index points and the neighbor index points as a new candidate index point, and sequentially executing the subsequent search process.
In the embodiment of the application, the index points are sorted based on the target relevance among the index points in the HNSW graph to obtain the sorting result of the index points, and then the index points are stored in the index memory according to the sorting result of the index points, so that the associated index points are stored in the index memory in a concentrated manner, and therefore, in the searching process of the HNSW graph, the associated index points can be obtained for calculation when the memory is accessed every time, the times of accessing the memory are reduced, the searching speed of the HNSW graph is improved, and the searching performance of the HNSW graph is optimized. And secondly, calculating the association degree between the index points and the sorted index points in the sliding window in the sorting process, avoiding calculating the association degree between the index points and all the sorted index points, and reducing the process of repeatedly calculating the association degree, thereby improving the efficiency of sorting the index points.
In addition, in order to prove the above effect of the method for offline sorting of index points in the embodiment of the present application, the present application adopts ten million levels of index point data for testing, and the test result shows that: by adopting the offline sorting method for the index points in the embodiment of the application, when sorting ten million levels of index point data, only about 40 minutes is needed, and compared with the existing offline sorting method for the index points, the efficiency is improved by 75 times. Moreover, according to the sorting result of the index points, after the index points are stored in the index memory, the search time consumption is reduced by 15 percent compared with the prior search time consumption, and meanwhile, the recall rate is not influenced.
Based on the same technical concept, an embodiment of the present application provides a schematic structural diagram of an index point offline sorting apparatus, as shown in fig. 19, where the apparatus 1900 includes:
a selecting module 1901, configured to select one index point from multiple index points of the search graph as an initial sorted index point to be added to the sliding window;
an ordering module 1902, configured to determine target association degrees between each reserved index point and the ordered index points in the sliding window; selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window; iteratively executing the step of determining the target association degree between each reserved index point and the sorted index points in the sliding window until the index points are added to the sliding window;
the storage module 1903 is configured to add the index points to the order of the sliding window as an index point sorting result, and store the index points in an index memory according to the index point sorting result.
Optionally, the ranking module 1902 is further configured to:
after each iteration process, if the number of sorted index points in the sliding window is greater than a preset threshold, removing the oldest added sorted index points from the sliding window, wherein the preset threshold is determined based on a central processor cache.
Optionally, the sorting module 1902 is specifically configured to:
sorting the index points according to the target relevance to obtain a target relevance sorting result;
and selecting a target index point from the index points as an ordered index point based on the target association ordering result, and adding the ordered index point to the sliding window.
Optionally, the sorting module 1902 is specifically configured to:
aiming at each index point, the following steps are respectively executed:
acquiring historical association degree between an index point and the sorted index points in the sliding window in the last iteration process;
determining a first sub-association degree between the index point and a target index point newly added to the sliding window in the last iteration process;
and determining a target relevance between the index point and the sorted index points in the sliding window based on the first sub relevance and the historical relevance.
Optionally, the sorting module 1902 is specifically configured to:
if the sorted index point which is added earliest is removed from the sliding window after the last iteration process, determining a second sub-association degree between the index point and the sorted index point which is added earliest;
determining a target degree of association between the index point and the sorted index points in the sliding window based on the first degree of sub-association, the second degree of sub-association, and the historical degree of association.
Optionally, the sorting module 1902 is specifically configured to:
aiming at each index point, the following steps are respectively executed:
determining each sorted index point in the sliding window and the association score of each sorted index point and one index point;
and determining a target association degree between the index point and the sorted index points in the sliding window based on the obtained association scores.
Optionally, the sorting module 1902 is specifically configured to:
obtaining each index point, and corresponding historical association sequencing results in the last iteration process;
and adjusting the historical association sequencing result based on the association degree of each target to obtain a target association sequencing result.
Optionally, the index points in the target association sorting result are arranged according to the sequence of the target association degree from large to small;
the ranking module 1902 is specifically configured to:
and taking the first index point in the target association sorting result as a target index point, and adding the target index point to the sliding window.
Optionally, a search module 1904 is also included;
the search module 1904 is specifically configured to:
obtaining a candidate index point of a search condition from the index memory and at least one other index point continuously stored with the candidate index point;
storing the at least one other index point in a central processor cache;
if the at least one other index point comprises the neighbor index point of the candidate index point, acquiring the neighbor index point of the candidate index point from the cache of the central processor;
determining a search result of the search condition based on the candidate index point and a neighbor index point of the candidate index point.
In the embodiment of the application, the index points are sorted based on the target relevance among the index points to obtain the sorting result of the index points, and then the index points are stored in the index memory according to the sorting result of the index points, so that the associated index points are stored in the index memory in a set, and therefore, in the searching process, the associated index points can be obtained for calculation when the memory is accessed every time, the times of accessing the memory are reduced, and the searching speed is further improved. And secondly, calculating the association degree between the index points and the sorted index points in the sliding window in the sorting process, avoiding calculating the association degree between the index points and all the sorted index points, and reducing the process of repeatedly calculating the association degree, thereby improving the efficiency of sorting the index points.
Based on the same technical concept, the embodiment of the present application provides a computer device, which may be the terminal device and/or the server shown in fig. 1, as shown in fig. 20, including at least one processor 2001 and a memory 2002 connected to the at least one processor, and a specific connection medium between the processor 2001 and the memory 2002 is not limited in the embodiment of the present application, and the processor 2001 and the memory 2002 are connected through a bus in fig. 20 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.
In the embodiment of the present application, the memory 2002 stores instructions executable by the at least one processor 2001, and the at least one processor 2001 may execute the steps of the above-mentioned offline ordering method for index points by executing the instructions stored in the memory 2002.
The processor 2001 is a control center of the computer device, and can connect various parts of the computer device by using various interfaces and lines, and implement the index point sorting by executing or executing the instructions stored in the memory 2002 and calling the data stored in the memory 2002. Optionally, the processor 2001 may include one or more processing units, and the processor 2001 may integrate an application processor and a modem processor, wherein the application processor mainly handles an operating system, a user interface, an application program, and the like, and the modem processor mainly handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 2001. In some embodiments, the processor 2001 and the memory 2002 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 2001 may be a general-purpose processor such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
The memory 2002, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 2002 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 2002 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer device, but is not limited to such. The memory 2002 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
Based on the same inventive concept, embodiments of the present application provide a computer-readable storage medium, which stores a computer program executable by a computer device, and when the program runs on the computer device, the computer device is caused to execute the steps of the above-mentioned offline ordering method for index points.
Based on the same inventive concept, embodiments of the present application provide a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer device, cause the computer device to perform the steps of the above-mentioned method for offline sorting of index points.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (13)

1. An offline sorting method for index points is characterized by comprising the following steps:
selecting an index point from a plurality of index points of a search graph as an initial sorted index point to be added to a sliding window;
determining target association degrees between each reserved index point and the sorted index points in the sliding window;
selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window;
iteratively executing the step of determining the target association degree between each reserved index point and the sorted index points in the sliding window until the index points are added to the sliding window;
and adding the index points to the sequence of the sliding window as an index point sorting result, and storing the index points in an index memory according to the index point sorting result.
2. The method of claim 1, further comprising:
after each iteration process, if the number of sorted index points in the sliding window is greater than a preset threshold, removing the oldest added sorted index points from the sliding window, wherein the preset threshold is determined based on a central processor cache.
3. The method of claim 1, wherein selecting target index points from the retained index points based on the respective target relevance to add to the sliding window comprises:
sorting the index points according to the target relevance to obtain a target relevance sorting result;
and selecting a target index point from the index points as an ordered index point based on the target association ordering result, and adding the ordered index point to the sliding window.
4. The method of claim 3, wherein determining a target degree of association between each retained index point and an ordered index point in the sliding window comprises:
aiming at each index point, the following steps are respectively executed:
acquiring historical association degree between an index point and the sorted index points in the sliding window in the last iteration process;
determining a first sub-association degree between the index point and a target index point newly added to the sliding window in the last iteration process;
and determining a target relevance between the index point and the sorted index points in the sliding window based on the first sub relevance and the historical relevance.
5. The method of claim 4, wherein determining the target degree of association between the one index point and the sorted index points in the sliding window based on the first sub-degree of association and the historical degree of association comprises:
if the sorted index point which is added earliest is removed from the sliding window after the last iteration process, determining a second sub-association degree between the index point and the sorted index point which is added earliest;
determining a target degree of association between the index point and the sorted index points in the sliding window based on the first degree of sub-association, the second degree of sub-association, and the historical degree of association.
6. The method of claim 3, wherein determining a target degree of association between each retained index point and an ordered index point in the sliding window comprises:
aiming at each index point, the following steps are respectively executed:
determining each sorted index point in the sliding window and the association score of each sorted index point and one index point;
and determining a target association degree between the index point and the sorted index points in the sliding window based on the obtained association scores.
7. The method of claim 3, wherein the sorting the index points according to the target relevance to obtain a target relevance sorting result comprises:
obtaining each index point, and corresponding historical association sequencing results in the last iteration process;
and adjusting the historical association sequencing result based on the association degree of each target to obtain a target association sequencing result.
8. The method according to claim 3, wherein the index points in the target association sorting result are arranged in the order of the target association degree from large to small;
selecting a target index point from the index points as a sorted index point based on the target association sorting result, and adding the sorted index point to the sliding window, wherein the method comprises the following steps:
and taking the first index point in the target association sorting result as a target index point, and adding the target index point to the sliding window.
9. The method according to any one of claims 1 to 8, wherein after storing the plurality of index points in an index memory according to the sorting result of the index points, further comprising:
obtaining a candidate index point of a search condition from the index memory and at least one other index point continuously stored with the candidate index point;
storing the at least one other index point in a central processor cache;
if the at least one other index point comprises the neighbor index point of the candidate index point, acquiring the neighbor index point of the candidate index point from the cache of the central processor;
determining a search result of the search condition based on the candidate index point and a neighbor index point of the candidate index point.
10. An apparatus for offline sorting of index points, comprising:
the selection module is used for selecting one index point from a plurality of index points of the search graph as an initial sorted index point to be added to the sliding window;
the sorting module is used for determining the target association degree between each reserved index point and each sorted index point in the sliding window; selecting target index points from the reserved index points based on the target relevance degrees and adding the target index points to the sliding window; iteratively executing the step of determining the target association degree between each reserved index point and the sorted index points in the sliding window until the index points are added to the sliding window;
and the storage module is used for adding the index points to the sequence of the sliding window as an index point sorting result and storing the index points in an index memory according to the index point sorting result.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 9 are performed by the processor when the program is executed.
12. A computer-readable storage medium, having stored thereon a computer program executable by a computer device, for causing the computer device to perform the steps of the method of any one of claims 1 to 9, when the program is run on the computer device.
13. A computer program product, characterized in that the computer program product comprises a computer program stored on a computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to carry out the steps of the method of any one of claims 1-9.
CN202111490990.8A 2021-12-08 2021-12-08 Method, device, equipment and storage medium for offline sorting of index points Pending CN114329135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111490990.8A CN114329135A (en) 2021-12-08 2021-12-08 Method, device, equipment and storage medium for offline sorting of index points

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111490990.8A CN114329135A (en) 2021-12-08 2021-12-08 Method, device, equipment and storage medium for offline sorting of index points

Publications (1)

Publication Number Publication Date
CN114329135A true CN114329135A (en) 2022-04-12

Family

ID=81049789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111490990.8A Pending CN114329135A (en) 2021-12-08 2021-12-08 Method, device, equipment and storage medium for offline sorting of index points

Country Status (1)

Country Link
CN (1) CN114329135A (en)

Similar Documents

Publication Publication Date Title
CN108304512B (en) Video search engine coarse sorting method and device and electronic equipment
CN110209922B (en) Object recommendation method and device, storage medium and computer equipment
CN106844664B (en) Time series data index construction method based on abstract
US8527564B2 (en) Image object retrieval based on aggregation of visual annotations
CN107545276A (en) The various visual angles learning method of joint low-rank representation and sparse regression
US20110179013A1 (en) Search Log Online Analytic Processing
CN105589929A (en) Image retrieval method and device
CN112989169A (en) Target object identification method, information recommendation method, device, equipment and medium
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN108304585B (en) Result data selection method based on space keyword search and related device
US6910030B2 (en) Adaptive search method in feature vector space
CN111639230A (en) Similar video screening method, device, equipment and storage medium
CN111198961A (en) Commodity searching method and device and server
CN111428120B (en) Information determination method and device, electronic equipment and storage medium
CN110851708B (en) Negative sample extraction method, device, computer equipment and storage medium
CN111125158B (en) Data table processing method, device, medium and electronic equipment
CN112749296A (en) Video recommendation method and device, server and storage medium
CN115408618B (en) Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features
CN114329135A (en) Method, device, equipment and storage medium for offline sorting of index points
CN110705889A (en) Enterprise screening method, device, equipment and storage medium
CN114610960A (en) Real-time recommendation method based on item2vec and vector clustering
CN111708745B (en) Cross-media data sharing representation method and user behavior analysis method and system
CN111143582B (en) Multimedia resource recommendation method and device for updating association words in double indexes in real time
CN114170476A (en) Image retrieval model training method and device, electronic equipment and storage medium
CN114385931A (en) Method and device for obtaining recommendation form and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination