CN116501828B - Non-perception vector query method and system for server based on unstructured data set - Google Patents

Non-perception vector query method and system for server based on unstructured data set Download PDF

Info

Publication number
CN116501828B
CN116501828B CN202310763804.6A CN202310763804A CN116501828B CN 116501828 B CN116501828 B CN 116501828B CN 202310763804 A CN202310763804 A CN 202310763804A CN 116501828 B CN116501828 B CN 116501828B
Authority
CN
China
Prior art keywords
vector
query
calculation
cluster
transmission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310763804.6A
Other languages
Chinese (zh)
Other versions
CN116501828A (en
Inventor
金鑫
刘譞哲
章梓立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202310763804.6A priority Critical patent/CN116501828B/en
Publication of CN116501828A publication Critical patent/CN116501828A/en
Application granted granted Critical
Publication of CN116501828B publication Critical patent/CN116501828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a server non-perception vector query method and a system based on unstructured data sets, wherein the method is applied to the technical field of vector query, and comprises the following steps: acquiring a batch query request, wherein the batch query request comprises a plurality of vector query requests; searching a plurality of vector clusters corresponding to the batch query requests to generate a query plan; wherein each vector cluster is divided into a plurality of balanced vector clusters; optimizing the query plan, eliminating redundant transmission in the query plan, and obtaining the optimized query plan; acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence; grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan; and pushing each group into a global grouping queue for transmission and calculation according to a grouping plan, and obtaining a vector query result.

Description

Non-perception vector query method and system for server based on unstructured data set
Technical Field
The invention relates to the technical field of vector query, in particular to a server non-perception vector query method and system based on unstructured data sets.
Background
Vector retrieval refers to a technique of converting unstructured data into high-dimensional feature vectors, and performing query, calculation and storage. At present, vector retrieval technology is widely used in the artificial intelligence fields of face recognition, information retrieval, recommendation systems and the like. In the prior art, a graphics processor (Graphics Processing Unit, GPU), as a highly parallelized coprocessor, is a natural choice for handling vector operations, performing vector query tasks. Because the GPU video memory capacity is far smaller than that of the host memory, in order to better utilize the computing resources of the GPU and overcome the defect of insufficient video memory, in practical vector retrieval application, the host memory is often used as expansion storage of the GPU video memory.
However, in the practical vector query application, the technical scheme of combining the GPU with the host memory greatly increases the data transmission overhead from the host memory to the GPU video memory, and the computing resources on the GPU cannot be fully utilized, so that the vector query efficiency is low, the required time is too long, and certain transmission resources and computing resources are wasted.
Therefore, it is necessary to develop a server non-perception vector query method and system based on unstructured data sets, so as to improve vector query efficiency and achieve higher computing performance and cost performance.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention provide a server non-aware vector query method and system based on unstructured data sets to overcome or at least partially solve the foregoing problems.
The first aspect of the embodiment of the invention provides a server non-perception vector query method based on an unstructured data set, which comprises the following steps:
acquiring a batch query request, wherein the batch query request comprises a plurality of vector query requests;
searching to obtain a plurality of vector clusters corresponding to the batch query requests by utilizing an IVF index constructed on an off-line basis on an unstructured dataset, and generating a query plan, wherein the query plan represents: transmitting the plurality of vector clusters from a main memory to a transmission sequence in a video memory of a GPU, and calculating a calculation sequence of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters;
optimizing the query plan, eliminating redundant transmission in the query plan, and obtaining an optimized query plan;
acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence;
Grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan;
and pushing each group into a global grouping queue for transmission and calculation according to the grouping plan, and obtaining a vector query result.
The second aspect of the embodiment of the application also provides a server non-perception vector query system based on an unstructured data set, which comprises:
a declarative application programming interface for obtaining a batch of query requests, the batch of query requests including a plurality of vector query requests;
the vector database is used for searching a plurality of vector clusters corresponding to the batch query requests by utilizing an IVF index constructed on the basis of an unstructured data set offline to generate a query plan, and the query plan represents: transmitting the plurality of vector clusters from a main memory to a transmission sequence in a video memory of a GPU, and calculating a calculation sequence of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters;
the query plan optimizing module is used for optimizing the query plan, eliminating redundant transmission in the query plan and obtaining an optimized query plan;
the pipeline scheduler is used for acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence;
The pipeline scheduler is further used for grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan;
and the GPU processor is used for pushing each group into a global grouping queue for transmission and calculation according to the grouping plan to obtain a vector query result.
The embodiment of the application provides a server non-perception vector query method and a system based on unstructured data sets, wherein the method comprises the following steps: acquiring a batch query request, wherein the batch query request comprises a plurality of vector query requests; searching to obtain a plurality of vector clusters corresponding to the batch query requests by utilizing an IVF index constructed on an off-line basis on an unstructured dataset, and generating a query plan, wherein the query plan represents: transmitting the plurality of vector clusters from a main memory to a transmission sequence in a video memory of a GPU, and calculating a calculation sequence of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters; optimizing the query plan, eliminating redundant transmission in the query plan, and obtaining an optimized query plan; acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence; grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan; and pushing each group into a global grouping queue for transmission and calculation according to the grouping plan, and obtaining a vector query result. On one hand, the embodiment of the application eliminates redundant transmission in the query plan by optimizing the query plan, improves the transmission efficiency and saves transmission resources. On the other hand, the embodiment of the application obtains the optimal execution sequence by reordering the query plan and groups the query plan by utilizing the dynamic planning algorithm, so that the transmission and the calculation are carried out according to the determined optimal execution sequence and the optimal grouping scheme, and the vector query efficiency is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a server non-aware vector query method based on unstructured data sets according to an embodiment of the present invention;
FIG. 2 is a schematic illustration of an initial query plan provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of calculating time and space utilization of a stream processor in a GPU according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of query plan optimization results within a batch process according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of query plan optimization results between batch processes provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of reordering of a pipeline scheduler according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a dynamic kernel extended computing process provided by an embodiment of the present invention;
FIG. 8 is a schematic diagram of a query process of a vector query method according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a structure of a server non-aware vector query system based on unstructured data sets according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings in the embodiments of the present invention. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Breakthroughs in deep learning techniques have enabled unstructured data to be converted into high-dimensional feature vectors, vector retrieval is thus also used for a wide range of artificial intelligence applications, such as: face recognition, information retrieval and recommendation systems, and the like. With the dramatic increase in the amount of user data, the data set sizes of many vector searches have now reached the billions or even billions of levels. In addition, the GPU acts as a highly parallelized coprocessor, which is a natural choice for handling vector operations, performing vector retrieval tasks. However, the GPU has a higher price than the CPU, and its video memory capacity is far smaller than that of the host memory, so when processing a large-scale vector dataset, the number of GPUs far exceeding the required GPU computing resources is often required to place such a large-scale dataset, resulting in waste of GPU computing resources. In order to better utilize the computing resources of the GPU and overcome the defect of insufficient video memory, the performance and the cost performance of the GPU are better than those of a CPU under a large-scale data set, and the most natural method is to utilize the memory of a host as the expansion storage of the video memory.
However, according to the above technical solution of GPU combined with host memory, there are the following problems: on the one hand, there is a severe redundancy transmission in the data transfer from the host memory to the GPU memory. When processing a batch of query requests (queries), different query requests may be calculated on the same data, and if calculation and data transmission are performed according to the original calculation mode, the same data is transmitted multiple times in a batch of queries, which greatly increases the cost of data transmission. On the other hand, when the existing vector computing engine processes the computation of each query request on the data, the computing resources of the GPU cannot be fully utilized, which results in a low utilization rate of the GPU computing resources.
In view of the above problems, embodiments of the present application provide a server non-perception vector query method and system based on unstructured data sets, so as to solve the above problems of redundant transmission and low utilization of computing resources, so as to achieve higher computing performance and cost performance. The vector query method provided by the embodiment of the application is described in detail below through some embodiments and application scenarios thereof with reference to the accompanying drawings.
The present embodiment proposes a server non-perception vector query method based on an unstructured dataset, referring to fig. 1, fig. 1 shows a step flowchart of a server non-perception vector query method based on an unstructured dataset, as shown in fig. 1, and the method includes:
step S101, a batch query request is acquired, where the batch query request includes a plurality of vector query requests.
In a specific implementation, the batch query requests include a plurality of vector query requests, and in an actual query process, calculation needs to be performed according to each vector query request in the batch query requests, and vectors related to different vector query requests or required query vectors are different.
In an optional embodiment, the step S101, obtaining a batch query request includes:
step S1011, receiving query request information input by a user, where the query request information includes at least: the vector of the batch query request, the precision requirement of the search, the time of the expected query processing, and the number of nearest neighbor vectors returned.
Step S1012, converting the query request information into a search configuration and a resource configuration through a declarative application programming interface.
Step S1013, executing the search configuration and the resource configuration, and generating the batch query request.
In the actual application process, the user submits a corresponding job (inputs the query request information of the query), and the query request information of the user can be automatically converted into specific search configuration and resource configuration through the declarative application programming interface of the embodiment, so that the search configuration and the resource configuration are executed, and batch query requests are generated. Therefore, the system can automatically generate schematic search configuration and resource configuration without manually configuring corresponding resources for the query, and further generate batch query requests. The embodiment provides a group of declarative Application Programming Interfaces (APIs) which are 'server-less', a great amount of resource configuration details (such as the quantity, the type and the communication mode of the GPU) are shielded, a developer only needs to make simple declarations (such as a data set, precision and expected completion time) on the model training task of the developer at the group of interfaces, and the system can automatically allocate and schedule resources for the model training task, so that the development complexity of the model training task is greatly reduced.
Step S102, searching a plurality of vector clusters corresponding to the batch query requests by utilizing an IVF index constructed on the basis of an unstructured data set in an offline manner, and generating a query plan, wherein the query plan represents: transmitting the plurality of vector clusters from a main memory to a transmission sequence in a video memory of a GPU, and calculating a calculation sequence of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters.
In this embodiment, constructing an inverted index (inverted file system, IVF) in a vector database in advance refers to a technique of k-means clustering all vectors in the vector database, and dividing the vectors into a plurality of vector clusters. Specifically, unstructured data in deep learning is converted into high-dimensional feature vectors, a large number of unstructured data sets are stored in the form of vectors, a vector database is obtained, and query and retrieval of the unstructured data are achieved through query of vectors in the vector database. Based on the pre-constructed IVF index, query is performed according to the batch query requests obtained in the step S101, each vector query request in the batch query requests comprises a query vector, and a plurality of vector clusters closest to the query vector can be determined. In the embodiment, the provided large-scale vector data set is divided into a plurality of vector clusters by k-means clustering in advance, so that only a plurality of vector clusters nearest to the query vector are searched during online query, all vectors do not need to be traversed one by one, the query performance is improved, and the delay is reduced.
In this embodiment, through the offline IVF index, the vector clusters corresponding to all the vector query requests are found, and a matrix of query vector-search vector clusters is constructed as a query plan, where each query vector corresponds to a vector query request. The query plan represents: and calculating the calculation sequence of the vector query request in the vector clusters.
Referring to FIG. 2, FIG. 2 shows a schematic diagram of an initial query plan, as shown in FIG. 2, with G on the horizontal axis 1 、G 2 And G 3 Representing the transmission order of the query plan, C 1 、C 2 、C 3 、C 4 、C 5 、C 6 Query-1 (Q) representing different vector clusters, vertical axis 1 )、Query-2(Q 2 )、Query-3(Q 3 ) Representing different vector query requests. For example, let the video memory capacity of the GPU be three vector clusters, i.e., the GPU can only store three vector clusters, according to the query plan shown in fig. 2, it can be seen that the first transmission G 1 Is to transfer C from host memory 1 、C 2 、C 3 Three vector clusters are transmitted to the GPU video memory, and at C 1 Medium computing vector query request Q 1 At C 2 Middle calculation Q 2 At C 3 Middle calculation Q 3 After the calculation is completed, three vectors are clusteredExpelling the data into a host memory; then perform a second transmission G 2 Transfer C from host memory 4 、C 5 、C 6 Three vector clusters are transmitted to the GPU video memory, and at C 4 Middle calculation Q 1 At C 5 Middle calculation Q 2 At C 6 Middle calculation Q 3 After the calculation is completed, the three vector clusters are evicted to the host memory; third transmission G 3 Is to transfer C from host memory 1 、C 2 、C 3 The three vector clusters are transmitted to the GPU video memory again, and at C 3 Middle calculation Q 1 At C 1 Middle calculation Q 2 At C 2 Middle calculation Q 3 After the calculation is completed, the three vector clusters are evicted to the host memory. As can be seen from FIG. 2, the initial query plan includes the transfer order of a plurality of vector clusters from main memory to the GPU's video memory (e.g., transfer C first 1 、C 2 、C 3 Retransmission C 4 、C 5 、C 6 Finally, retransmit C 3 、C 1 、C 2 ) And, calculating the order of calculation of vector query requests in the vector cluster (e.g., at G 3 The calculation sequence of (a) is as follows: at C 3 Middle calculation Q 1 At C 1 Middle calculation Q 2 At C 2 Middle calculation Q 3 )。
In an alternative embodiment, the vector clusters are divided into a plurality of balanced vector clusters by the steps of:
step S1021, dividing each vector cluster according to the size of the minimum vector cluster to obtain a plurality of candidate vector clusters; specifically, an IVF index is constructed according to a k-means clustering method to obtain n vector clusters, the n vector clusters are different in size, and the smallest vector cluster is determined. And dividing each vector cluster by taking the size of the minimum vector cluster as a unit to obtain a plurality of candidate vector clusters.
Step S1022, determining whether the variance of the candidate vector cluster size is smaller than a preset variance value.
Step S1023, determining the candidate vector cluster as the balance vector cluster in the case that the variance of the candidate vector cluster size is smaller than the preset variance value. Specifically, when the variance of the candidate vector cluster is smaller than the preset variance value, the candidate vector cluster is similar in size and small in difference, and can be used as a balance vector cluster.
Step S1024, recursively halving the size of the minimum vector cluster under the condition that the variance of the size of the candidate vector cluster is larger than or equal to the preset variance value, and dividing the size of the halved vector cluster to generate a new candidate vector cluster until the variance of the size of the candidate vector cluster is smaller than the preset variance value. Specifically, when the variance of the candidate vector clusters is greater than or equal to the preset variance value, which means that the magnitude difference between the candidate vector clusters is still too large and does not converge to the expected range, the vector clusters can be re-divided according to the fact that the half of the minimum vector cluster in step S1021 is the new dividing unit, so as to generate new candidate vector clusters, then the variance of the candidate vector clusters is detected again, if the variance exceeds the preset variance value, the dividing unit is reduced by half, and the candidate vector clusters are regenerated until the variance of the candidate vector clusters is smaller than the preset variance value.
In the implementation, after the IVF index is built in advance on line, each generated vector cluster is divided into a plurality of balance vector clusters, and the vector clusters with variance values smaller than a preset variance value are determined to be the balance vector clusters by calculating the variance values of the vector clusters, so that the fact that the sizes of the balance vector clusters are almost the same is achieved.
During vector queries, some frequently-tailed computation blocks (blocks) block the entire compute kernel (kernel) so that only part of the GPU's stream processor (streaming multiprocessor) is utilized at the same time. Referring to FIG. 3, FIG. 3 shows a schematic diagram of the computational time and space utilization of a stream processor in a GPU, as shown in FIG. 3, including a plurality of stream processors (streaming multiprocessor, SM) in the GPU (as shown in SM in FIG. 3 during the actual vector query process 1 、SM 2 、SM 3 、SM 4 ) Each stream processor performs calculation of vector query requests on the vector clusters, respectively, as shown in fig. 3, SM 1 In vector cluster C 1 On-execution vector query request Q 1 SM for calculation of (c) 2 In vector cluster C 2 On-execution vector query request Q 2 SM for calculation of (c) 3 In vector cluster C 3 On-execution vector query request Q 3 Is calculated by the computer. It should be noted that, due to the imbalance of the k-means clustering algorithm, the size of each vector cluster is different, and the size of the vector clusters is different, so that the time required for each stream processor to perform the calculation is different, and the larger the vector cluster, the longer the time required for the calculation is. As shown in fig. 3, the horizontal axis arrow direction indicates the direction of the time axis, and the vector cluster C 1 Far greater than C 2 And C 3 Leading to a stream processor SM 1 Calculated time ratio SM 2 And SM 3 The computation takes longer, which results in a query request Q 1 In vector cluster C 1 Execution time on the graphics processor (SM) slows down the overall GPU kernel computation, resulting in a stream processor SM 2 And SM 3 Idle time is generated, reducing the time utilization of the stream processor in the GPU.
In order to solve the above problems, the embodiments of the present application provide a vector cluster balancing technique, by dividing a vector cluster into a plurality of balanced vector clusters with similar sizes in advance, and transmitting and calculating the balanced vector clusters with the particle size as the particle size, thereby avoiding the situation that the query request drags the whole kernel calculation due to the execution time on the individual vector clusters, and solving the problem of insufficient time utilization of the GPU stream processor.
Step S103, optimizing the query plan, eliminating redundant transmission in the query plan, and obtaining the optimized query plan.
According to the original query plan, redundant transmissions are easily generated, as shown in FIG. 2, the first transmission G 1 Is to transfer C from host memory 1 、C 2 、C 3 Three vector clusters are transmitted to the GPU video memory, and G is transmitted for the second time 2 Is to transfer C from host memory 4 、C 5 、C 6 Three vector clusters are transmitted to the GPU video memory, and G is transmitted for the third time 3 Is to transfer C from host memory 1 、C 2 、C 3 The three vector clusters are transmitted to the GPU video memory again, and canTo get that 9 times of cumulative transmission of vector clusters in the above procedure, but exchange G 2 And G 3 Only 6 vector clusters are transmitted after the sequence of (a), the initial query plan produces 3 redundant transmissions, resulting in C 1 、C 2 、C 3 The vector clusters are repeatedly transmitted from the main memory to the GPU video memory, so that transmission resources are wasted, and the overall vector query efficiency is affected. The embodiment of the application provides the method for optimizing the query plan, eliminating redundant transmission in the query plan, obtaining the optimized query plan, avoiding repeated transmission of the same vector cluster by the optimized query plan, and improving the transmission calculation efficiency of the vector cluster.
In an alternative embodiment, the step S103, optimizing the query plan, and eliminating redundant transmissions in the query plan, includes:
step S1031, adjusting the transmission frequency of each vector cluster to be at most one transmission.
Step S1032, the calculation sequence of the vector clusters existing in the video memory of the GPU is adjusted forward.
In vector query processing, the processing of each subset (i.e., each time a vector query request is computed over a cluster of vectors) is independent, based on which the query plan of each query request can be improved without changing the correctness of its query results. The present embodiment improves query planning from both aspects within and between batch processing to completely eliminate redundant transmission of batch query requests.
On the one hand, optimization of the inside of the batch process is performed, corresponding to step S1031. Referring to FIG. 4, FIG. 4 shows a schematic diagram of query plan optimization results within a batch process, such as the example shown in FIG. 4, where a vector query would involve a total of 6 vector clusters: c (C) 1 、C 2 、C 3 、C 4 、C 5 、C 6 The batch query requests comprise 3 vector query requests: q (Q) 1 、Q 2 、Q 3 Let the matrix in FIG. 4 be M, each element [ i, j ] in matrix M]Represented in corresponding vector cluster C j In computing a corresponding vector query request Q i Table 0 of calculation results of (2)Showing the vector cluster C j Is not requested by vector query Q i Related, i.e. vector cluster C j Not the vector query request Q i The required vector; 1 represents the vector cluster C j Is vector query request Q i Related, i.e. vector cluster C j Is the vector query request Q i The required vector.
Based on the following observations: the optimal transmission times of the query are always not less than the number of the related vector clusters, so that the optimization target is to improve the original query plan, the total transmission times of the improved query plan are the number of the vector clusters, and the correctness of the result is ensured. When one vector cluster is transmitted to enter the GPU video memory, the improved query plan immediately processes the query request related to the current vector cluster, and in this way, the optimized query plan only transmits each vector cluster at most once, thereby eliminating redundant data transmission and realizing query plan optimization in batch processing.
On the other hand, optimization between batch processing requests is performed, corresponding to step S1032. Referring to FIG. 5, FIG. 5 shows a schematic diagram of the results of optimization of a query plan between batch processes, as shown in FIG. 5, assuming a GPU memory capacity of 3, initially accommodates two vector clusters: c (C) 5 And C 6 I.e. at the time of the last batch, C will be 5 And C 6 Transmitted to the GPU video memory, and when the next batch is processed, the processing of C is also needed 5 And C 6 And (5) processing. In this case, the optimization between batch processing requests may adjust the computation order of the vector clusters already in the GPU's memory forward, specifically to the first of the processing order, i.e., C 5 And C 6 Moves to the front of the overall query request processing order. In this way, the vector clusters transmitted to the GPU video memory in the previous batch can be fully reused, and the cost for transmission is further reduced.
Step S104, acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence;
in this embodiment, in order to acquire the transmission time information and the calculation time information, an analyzer unit is constructed in advance by inquiring the history or preprocessing some of the queries on line, and the analyzer can predict the data transmission time or calculation time of each query on line, that is, according to the optimized inquiry plan, the analyzer can calculate the time required to predict the transmission data of the batch inquiry request and the time required to calculate the GPU. According to the obtained time information, the optimized query plan can be further reordered to obtain an optimal execution sequence.
In an alternative embodiment, the acquiring the transmission time information and the calculation time information includes:
calculating the transmission time information of each vector cluster according to the following formula:
transmission time =Wherein->The size of the balance vector cluster in the vector cluster is that a and b are transmission time parameters obtained in advance through least square fitting, and m represents the number of the balance vector clusters in the vector cluster;
calculating the calculation time information of each vector cluster according to the following formula:
calculation time =Wherein A and B are calculated time parameters obtained in advance through least square fitting;
the transmission time information represents time information required by each vector cluster to be transmitted from the main memory to the video memory of the GPU, and the calculation time information represents time information required by calculating a corresponding query request in each vector cluster.
In this embodiment, the analyzer is primarily used to predict the time to transfer data over PCIe from the host to the GPU memory, the transferTypically by invoking the cudaMemcpyAsync interface of cuda. The overall transmission time can be divided into two parts: real time of propagation and interface call overhead over PCIe. Specifically, assuming that the current packet contains m balanced vector clusters, the transmission time can be approximately calculated by the following formula: Wherein->Is the size of the cluster of balance vectors, a and b can be fitted on-line by least squares, i.e. calculated time parameters set by historical data or experience. The analyzer predicts the computation time as above, and computes according to the following formula: />Wherein A and B are calculation time parameters obtained in advance through least square fitting, and the calculation time and the calculation amount are in a linear function relation.
The present embodiment uses an analyzer to predict the time of transmission and computation of each packet, i.e., to predict the time of transmission and computation of each vector cluster (including a plurality of balanced vector clusters), and the predicted time information (transmission time information and computation time information) will be used by the pipeline scheduler to assist in computing the optimal pipeline plan. Specifically, after receiving the optimized query plan, the pipeline scheduler performs further reordering and grouping on the plan to achieve maximum transmission computation overlap and minimum pipeline overhead. The simplest approach is to enumerate all pipeline scheduling schemes where the run time is found to be the smallest, but enumeration of the algorithm to run time incurs significant algorithm overhead. Therefore, the embodiment of the application breaks down the whole search problem into two sub-problems, and firstly, step S104 is executed, and the optimized query plan is reordered by taking the balance vector cluster as granularity, so as to obtain the optimal execution sequence. After determining the optimal execution order on the granularity of the balanced vector cluster, step S105 is performed to search for the optimal grouping plan on the order.
In an optional implementation manner, the step S104 of obtaining transmission time information and calculation time information, and reordering the optimized query plan with the balanced vector cluster as granularity includes:
step S1041, moving the execution sequence of the balance vector cluster with zero transmission time to the first position according to the transmission time information;
step S1042, according to the calculation time information, the execution sequence of the balance vector cluster with larger calculation time is adjusted forward.
In particular, referring to FIG. 6, FIG. 6 shows a reordering schematic of a pipeline scheduler, assuming all balanced vector clusters are { B } 1 ,B 2 ,B 3 ,……B m And represents a specific execution sequence, as shown in FIG. 6, there are a total of four balanced vector clusters B 1 、B 2 、B 3 、B 4 PCIe means transmission data, GPU means corresponding kernel computation, horizontal bar is time axis, longer horizontal bar means transmission time or computation time of the balanced vector cluster is longer. As shown in part (a) of FIG. 6, B 4 Has not to be transferred in the GPU, the execution sequence is B 1 →B 2 →B 3 →B 4 . Each cluster of balance vectors needs to be calculated after the transmission is completed.
According to step S1041, the execution order of the balanced vector cluster with the transmission time of zero is shifted to the first according to the transmission time information, so that the overlap and parallelism of the transmission and calculation can be increased. As shown in part (B) of fig. 6, a balanced vector cluster (B) with a transmission time of zero 4 ) Moves to the first order of processing of (c) so that B 4 Can be completely calculated by B 1 Covered by the transmission time of (c).
According to step S1042, the execution sequence of the balance vector cluster with larger calculation time is adjusted forward according to the calculation time information, and the balance vector cluster with larger calculation time is moved to the first place of the transmission sequence and the front of the execution sequence. As shown in part (c) of FIG. 6, since in the packet, B 3 Longer calculation time of (B) 3 Moving to the first bit of the transmission sequence, firstly transmitting it to the GPU video memory, thus moving its execution sequence forward, alternatively, it may be located at the next bit of the execution sequence of the balanced vector cluster with zero transmission time adjusted in step S1041 (e.g. B in fig. 6) 4 The latter bit of (B), which can make the required calculation time larger 3 ) And the computation time of (2) can be more covered by the transmission time of other balance vector clusters. The step S1041 and the step S1042 may be implemented by a corresponding sorting algorithm to obtain the optimal execution sequence.
Existing parallel modes of transmission and computation can cause bubbles (bubbles) in the pipeline such that computation and transmission do not overlap sufficiently. For example, in FIG. 6 part (a), the stream processor in the GPU needs to wait for B 1 After the PCIe transmission is completed, the calculation can be started, and the latency time generated by the calculation is the bubble. In order to solve the problem, the embodiment calculates the optimal execution sequence by adjusting the execution sequence (transmission sequence and calculation sequence) of the balance vector cluster in each vector cluster, so that the calculation time of the balance vector cluster can be covered by the transmission time of other balance vector clusters as much as possible, the bubbles in the balance vector cluster are reduced, and the vector query efficiency is improved.
Step S105, grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan.
After determining the optimal execution order at the granularity of the balanced vector clusters, the pipeline scheduler searches for an optimal grouping plan using a dynamic programming algorithm based on the order. In the related art, a basic pipeline strategy is to perform pipeline processing one by one according to vector clusters, that is, performing transmission and calculation one cluster by one. However, this fine-grained pipeline processing introduces two types of pipeline overhead. The first is the overhead of frequently calling the transmission interface, and the maximum calling overhead is achieved because the transmission data amount is fixed and the maximum interface calling times are caused according to the granularity of the vector clusters. Another overhead is the synchronization overhead between each packet transmission and computation, which is needed to synchronize for each vector cluster at cluster granularity, with the largest synchronization overhead. In order to reduce the overhead in both aspects, the present embodiment proposes to use a dynamic programming algorithm to find the optimal grouping scheme among the possible scheme combinations.
In an alternative embodiment, the step S105, using a dynamic programming algorithm, groups the optimal execution sequence to obtain a grouping plan, includes:
step S1051, using a dynamic programming algorithm, traversing the nodes of the search tree, each of the nodes representing a grouping scheme, the child nodes of each node representing sub-schemes of the grouping scheme.
In particular implementations, a global variable, the optimal time, is maintained using a dynamic programming algorithm for recording the optimal time for the pipeline scheme in the current search space. Each pipeline scheme is used as a node in the search tree, e.g. "will B 3 Move to the first "of the transmission order as a node of a search tree, then" will B 3 Move to the first of the transmission sequence, B 2 Set as B 3 The next "in transmission order" of the node.
Step S1052, by completely overlapping the remaining transmission and calculation, ignoring the pipeline processing overhead, calculating the time interval required for executing the packet scheme prediction of the current node.
Specifically, when traversing the nodes in the search tree, a time interval required for execution of the corresponding grouping scheme is estimated for each node, the time interval representing the longest time and shortest time that are expected to be required to execute the grouping scheme. In this embodiment, the time interval corresponding to the scheme in the current subtree may be calculated by completely overlapping the remaining transmission and calculation and ignoring the pipeline processing overhead.
Step S1053, pruning the current node and its child nodes when the minimum value of the time interval is greater than the minimum value of the time intervals of other nodes.
In this embodiment, a heuristic pruning method is combined in the dynamic programming algorithm, if the minimum value of the time interval of the node is greater than the maintenance optimal time (i.e., the minimum value of the time intervals of other nodes), the node and all the sub-nodes (subtrees) corresponding to the node are pruned, so that all the sub-nodes of the node do not need to be continuously calculated, the search space of dynamic programming is further reduced, and the calculation resources are saved.
And step S1054, determining the grouping scheme corresponding to the final remaining nodes in the search tree after pruning as the grouping plan. The time complexity of the dynamic programming algorithm is a polynomial level, and the algorithm with pruning heuristic can quickly find the optimal grouping scheme during operation, so that the extra system overhead caused by a pipeline is minimized.
And step S106, pushing each group into a global group queue for transmission and calculation according to the group plan, and obtaining a vector query result.
In an optional implementation manner, the step S106 pushes each group to a global packet queue for transmission and calculation according to the packet plan, so as to obtain a vector query result, which includes:
Step S1061, extracting a transmission task from the global packet queue through a storage manager, and maintaining a local packet transmission queue;
step S1062, according to the local packet transmission queue, the storage manager notifies the kernel controller to execute a corresponding calculation task after executing the transmission task of the current packet, and starts executing the transmission task of the next packet;
step S1063, extracting, by the kernel controller, a calculation task from the global packet queue, and maintaining a local packet calculation queue;
step S1064, according to the local packet calculation queue, the kernel controller executes the calculation task of the current packet, and after executing all packets, the kernel controller generates a vector query result.
In a possible implementation manner, in step S1064, the kernel controller performs a calculation task of the current packet, including:
dividing each balanced vector cluster in the current grouping into a plurality of sub-vector clusters according to the characteristics of the balanced vector cluster address;
taking the calculation of the vector query request on each sub-vector cluster as a block calculation task, and averagely distributing the block calculation task to a plurality of stream processors on the GPU;
Each of the stream processors performs the assigned block computation task.
When vector inquiry is performed, not only the problem of low time utilization rate but also the problem of low space utilization rate exists. Specifically, there are multiple stream processors in the GPU, which is easy to happen that the number of computing blocks of the whole computing kernel is smaller, and cannot occupy all stream processors of the GPU, i.e. the kernel computing corresponding to the query request of the current batch cannot occupy all stream processors, such as SM in fig. 3 4 As shown, there will be an individual stream processor (SM 4 ) In an idle state, resulting in low utilization of space.
In order to solve the problem of low space utilization, the embodiment provides a method for expanding a dynamic kernel: the feature of continuous vector cluster addresses after balancing, namely the feature that the addresses of vectors in a balanced vector cluster are continuous, is utilized, and one balanced vector cluster is dynamically and rapidly divided into a plurality of sub-vector clusters during operation. An original block calculation (block), i.e. a calculation of a vector query request on a balanced vector cluster, is equally divided into a plurality of smaller block calculations (blocks), each of which runs in a stream processor, and each stream processor can accommodate a plurality of block calculation runs. Referring to fig. 7, fig. 7 shows a schematic diagram of a calculation process of dynamic kernel expansion, and as shown in fig. 7, the horizontal axis is the development direction of the time axis, each block represents calculation of one sub-vector cluster, and a plurality of divided sub-vector clusters are evenly distributed to stream processors in the GPU, so that the task quantity of each stream processor is guaranteed to be similar, and dynamic kernel expansion is realized.
According to the embodiment of the application, before the kernel controller executes the calculation task of the current grouping according to the local grouping calculation queue, the balance vector cluster in the current grouping is divided into a plurality of sub-vector clusters, and the plurality of sub-vector clusters are averagely distributed to a plurality of stream processors in the GPU according to the busyness of the current stream processor, so that the stream processors are calculated, thereby ensuring that no idle stream processor exists and improving the occupancy rate of the GPU.
Example one,
Referring to fig. 8, fig. 8 shows a schematic diagram of a query procedure of a vector query method, and as shown in fig. 8, the vector query method is mainly divided into two parts, i.e., an on-line part and an off-line part.
For the offline portion, an IVF (inverted Table) index needs to be built in advance for the dataset, which uses k-means clustering to divide the vector of the dataset into a plurality of vector clusters. And for the generated vector clusters, the index is expanded by a vector cluster balancing module, and each vector cluster is divided into a plurality of balanced vector clusters with similar sizes. The data set is then loaded into host memory and transferred online to GPU video memory over PCIe under control of the storage management system.
In addition, the present embodiment now also pre-designs an analyzer component that builds by query history or preprocessing some queries online, and can be used to predict the data transfer time or computation time of each query online.
For the online part, according to the vector query method, the specific steps are as follows:
step 1: a batch query request is received from a user online. Specifically, query request information input by a user can be converted into corresponding search configuration and resource configuration through a declarative application programming interface, batch query requests are generated,
step 2: and selecting the latest n vector clusters as query data through an IVF index constructed offline, and generating a query plan.
Step 3: and (3) optimizing the whole query plan based on the vector cluster, eliminating redundant transmission in the query plan, and obtaining the optimized query plan.
Step 4: the runtime pipeline scheduler receives the modified query plan and the transmission time information and computation time information provided by the analyzer. The pipeline scheduler uses a greedy algorithm to reorder the query plans based on the information to obtain an optimal execution order. And the pipeline scheduler can also group the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan, and the pipeline scheduler can find the optimal balance between the pipeline efficiency and the pipeline overhead.
Step 5: each group in the packet plan is pushed into the global packet queue for transmission and computation.
Step 6: the kernel controller and the storage manager each extract tasks from the global group queue and maintain corresponding local packet queues. The memory manager starts transmission immediately as long as the local packet queue is not empty, and after the transmission of the current packet is completed, the memory manager informs the kernel controller to perform corresponding calculation and pops up the next packet to start transmission. If the kernel calculation engine is empty at this time, the kernel controller calculates the current grouping, adopts dynamic kernel expansion, divides the balanced vector cluster into a plurality of sub-vector clusters, and fully utilizes each GPU stream processor.
Step 7: after the calculation of all the groups is completed in batch processing, a final vector query result is generated, and the kernel controller returns the vector query result to the user.
The embodiment of the application provides query plan optimization based on data and vector clusters to solve the problem of redundant data transmission, solves the problem of underutilization of a stream processor of a GPU in space and time respectively by expanding balance of additional data clusters through a dynamic calculation kernel, and also provides a pipeline scheduler for reordering and grouping in running to maximize parallelism of transmission and calculation, thereby leading the embodiment of the application to surpass the prior related vector query technology and achieving higher calculation performance and cost performance.
A second aspect of the present application provides a server non-aware vector query system based on unstructured data sets, referring to fig. 9, fig. 9 shows a schematic structural diagram of a server non-aware vector query system based on unstructured data sets, as shown in fig. 9, where the system includes:
a declarative application programming interface for obtaining a batch of query requests, the batch of query requests including a plurality of vector query requests;
the vector database is used for searching a plurality of vector clusters corresponding to the batch query requests by utilizing an IVF index constructed on the basis of an unstructured data set offline to generate a query plan, and the query plan represents: transmitting the plurality of vector clusters from a main memory to a transmission sequence in a video memory of a GPU, and calculating a calculation sequence of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters;
the query plan optimizing module is used for optimizing the query plan, eliminating redundant transmission in the query plan and obtaining an optimized query plan;
the pipeline scheduler is used for acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence;
The pipeline scheduler is further used for grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan;
and the GPU processor is used for pushing each group into a global grouping queue for transmission and calculation according to the grouping plan to obtain a vector query result.
In an alternative embodiment, the query plan optimization module includes:
the batch processing internal optimization sub-module is used for adjusting the transmission times of each vector cluster to be at most one transmission time;
and the batch processing external optimization sub-module is used for adjusting the calculation sequence of the vector clusters existing in the video memory of the GPU forward.
In an alternative embodiment, the apparatus further comprises an analyzer:
the analyzer is configured to calculate the transmission time information of each of the vector clusters according to the following formula:
transmission time =Wherein->The size of the balance vector cluster in the vector cluster is that a and b are transmission time parameters obtained in advance through least square fitting, and m represents the number of the balance vector clusters in the vector cluster;
the analyzer is further configured to calculate the calculation time information of each of the vector clusters according to the following formula:
Calculation time =Wherein A and B are calculated time parameters obtained in advance through least square fitting; />
The transmission time information represents time information required by each vector cluster to be transmitted from the main memory to the video memory of the GPU, and the calculation time information represents time information required by calculating a corresponding query request in each vector cluster;
the analyzer is further configured to send the calculated transmission time information and the calculated time information to the pipeline scheduler.
In an alternative embodiment, the system further comprises: the vector cluster balancing module is used for dividing the vector clusters into a plurality of balanced vector clusters with similar sizes through the following steps:
dividing each vector cluster according to the size of the minimum vector cluster to obtain a plurality of candidate vector clusters;
determining whether the variance of the size of the candidate vector cluster is smaller than a preset variance value;
determining the candidate vector cluster as the balance vector cluster under the condition that the variance of the size of the candidate vector cluster is smaller than the preset variance value;
and recursively halving the size of the minimum vector cluster under the condition that the variance of the size of the candidate vector cluster is larger than or equal to the preset variance value, dividing the size of the halved vector cluster, and generating a new candidate vector cluster until the variance of the size of the candidate vector cluster is smaller than the preset variance value.
In an alternative embodiment, the pipeline scheduler includes:
the first reordering submodule is used for moving the execution sequence of the balance vector cluster with zero transmission time to the first position according to the transmission time information;
and the second reordering sub-module is used for forwardly adjusting the execution sequence of the balance vector cluster with larger calculation time according to the calculation time information.
In an alternative embodiment, the pipeline scheduler further comprises:
a node traversing sub-module, configured to traverse nodes of a search tree using a dynamic programming algorithm, where each node represents a grouping scheme, and a sub-node of each node represents a sub-scheme of the grouping scheme;
the time interval calculation sub-module is used for obtaining a time interval required by the packet scheme prediction of the current node through calculation by completely overlapping the residual transmission and calculation and neglecting the pipeline processing overhead;
the pruning sub-module is used for pruning the current node and the sub-nodes thereof under the condition that the minimum value of the time interval is larger than the minimum value of the time intervals of other nodes;
and the determining submodule is used for determining a grouping scheme corresponding to the finally remaining nodes in the search tree after pruning as the grouping plan.
In an alternative embodiment, the declarative application programming interface includes:
the query request information receiving sub-module is used for receiving query request information input by a user, and the query request information at least comprises: the vector of the batch query request, the searching precision requirement, the expected query processing time and the number of the nearest neighbor vectors returned;
the conversion submodule is used for converting the query request information into search configuration and resource configuration;
and the request generation sub-module is used for executing the search configuration and the resource configuration and generating the batch inquiry requests.
In an alternative embodiment, the GPU processor includes:
a storage manager for extracting transmission tasks from the global packet queues and maintaining local packet transmission queues;
the storage manager is further configured to notify the kernel controller to execute a corresponding calculation task and start executing a transmission task of a next packet after executing a transmission task of a current packet according to the local packet transmission queue;
the kernel controller is used for extracting calculation tasks from the global packet queue and maintaining a local packet calculation queue;
The kernel controller is further configured to execute a calculation task of a current packet according to the local packet calculation queue, and generate a vector query result after executing all packets.
In an alternative embodiment, the core controller includes a dynamic core extension module and a plurality of stream processors;
the dynamic kernel expansion module is used for dividing each balanced vector cluster in the current grouping into a plurality of sub-vector clusters according to the characteristics of the balanced vector cluster address;
the dynamic kernel expansion module is further configured to take the computation of the vector query request on each sub-vector cluster as a block computation task, and evenly distribute the block computation tasks to the plurality of stream processors;
the stream processor is used for executing the block computing task distributed by the dynamic kernel expansion module.
The embodiment of the invention also provides an electronic device, and referring to fig. 10, fig. 10 is a schematic structural diagram of the electronic device according to the embodiment of the invention. As shown in fig. 10, the electronic device 100 includes: the server non-perception vector query method based on the unstructured data set comprises a memory 110 and a processor 120, wherein the memory 110 is in communication connection with the processor 120 through a bus, and a computer program is stored in the memory 110 and can run on the processor 120, so that the steps in the server non-perception vector query method based on the unstructured data set are realized.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program/instruction is stored, which when executed by a processor, implements the steps in the server non-perception vector query method based on the unstructured data set disclosed in the embodiment of the invention.
The embodiment of the invention also provides a computer program product, which comprises a computer program/instruction, wherein the computer program/instruction realizes the steps in the server non-perception vector query method based on the unstructured data set disclosed by the embodiment of the invention when being executed by a processor.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices, and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The above detailed description of the server non-perception vector query method and system based on unstructured data set provided by the invention applies specific examples to illustrate the principle and implementation of the invention, and the above description of the examples is only used for helping to understand the method and core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. A server non-aware vector query method based on unstructured data sets, the method comprising:
acquiring batch query requests by using a server unaware declaration type application programming interface, wherein the batch query requests comprise a plurality of vector query requests;
searching to obtain a plurality of vector clusters corresponding to the batch query requests by utilizing an IVF index constructed on an off-line basis on an unstructured dataset, and generating a query plan, wherein the query plan represents: transmitting the plurality of vector clusters from a main memory to a transmission sequence in a video memory of a GPU, and calculating a calculation sequence of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters; the balance vector cluster represents a vector cluster with a variance value smaller than a preset variance value, which is obtained by dividing the vector cluster;
Optimizing the query plan, eliminating redundant transmission in the query plan, and obtaining an optimized query plan;
acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence;
grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan;
and pushing each group into a global grouping queue for transmission and calculation according to the grouping plan, and obtaining a vector query result.
2. The unstructured dataset-based server-unaware vector query method of claim 1, wherein said optimizing the query plan, eliminating redundant transmissions in the query plan, comprises:
the transmission times of each vector cluster are adjusted to be at most one transmission time;
the calculation sequence of the vector clusters existing in the video memory of the GPU is adjusted forward.
3. The server non-aware vector query method based on unstructured data sets according to claim 1, wherein the obtaining transmission time information and calculation time information comprises:
calculating the transmission time information of each vector cluster according to the following formula:
Transmission time =Wherein->The size of the balance vector cluster in the vector cluster is that a and b are transmission time parameters obtained in advance through least square fitting, and m represents the number of the balance vector clusters in the vector cluster;
calculating the calculation time information of each vector cluster according to the following formula:
calculation time =Wherein A and B are calculated time parameters obtained in advance through least square fitting;
the transmission time information represents time information required by each vector cluster to be transmitted from the main memory to the video memory of the GPU, and the calculation time information represents time information required by calculating a corresponding query request in each vector cluster.
4. The server non-aware vector query method based on unstructured data sets of claim 1, wherein the vector clusters are divided into a plurality of balanced vector clusters by the steps of:
dividing each vector cluster according to the size of the minimum vector cluster to obtain a plurality of candidate vector clusters;
determining whether the variance of the size of the candidate vector cluster is smaller than the preset variance value;
determining the candidate vector cluster as the balance vector cluster under the condition that the variance of the size of the candidate vector cluster is smaller than the preset variance value;
And recursively halving the size of the minimum vector cluster under the condition that the variance of the size of the candidate vector cluster is larger than or equal to the preset variance value, dividing the size of the halved vector cluster, and generating a new candidate vector cluster until the variance of the size of the candidate vector cluster is smaller than the preset variance value.
5. The server non-aware vector query method based on unstructured data sets according to claim 1, wherein the obtaining transmission time information and calculation time information reorders the optimized query plan with the balanced vector cluster as granularity, comprising:
according to the transmission time information, moving the execution sequence of the balance vector cluster with zero transmission time to the first position;
and according to the calculation time information, the execution sequence of the balance vector cluster with larger required calculation time is adjusted forward.
6. The server non-aware vector query method based on unstructured data sets according to claim 1, wherein said grouping the optimal execution order using a dynamic programming algorithm to obtain a grouping plan comprises:
traversing nodes of a search tree by using a dynamic programming algorithm, wherein each node represents a grouping scheme, and sub-nodes of each node represent sub-schemes of the grouping scheme;
Neglecting pipeline processing overhead through completely overlapping residual transmission and calculation, and calculating to obtain a time interval required by the packet scheme prediction of the current node;
pruning the current node and the child nodes thereof under the condition that the minimum value of the time interval is larger than the minimum value of the time intervals of other nodes;
and determining a grouping scheme corresponding to the finally remaining nodes in the search tree after pruning as the grouping plan.
7. The server non-aware vector query method based on unstructured data sets of claim 1, wherein the obtaining a batch query request using a server non-aware declarative application programming interface comprises:
receiving query request information input by a user, wherein the query request information at least comprises: the vector of the batch query request, the searching precision requirement, the expected query processing time and the number of the nearest neighbor vectors returned;
converting the query request information into search configuration and resource configuration through a declarative application programming interface which is not perceived by the server;
and executing the search configuration and the resource configuration, and generating the batch inquiry request.
8. The server non-aware vector query method based on unstructured data sets according to claim 1, wherein pushing each group into a global packet queue for transmission and calculation according to the packet plan to obtain a vector query result comprises:
extracting a transmission task from the global packet queue through a storage manager, and maintaining a local packet transmission queue;
according to the local packet transmission queue, the storage manager informs the kernel controller to execute a corresponding calculation task and starts to execute a transmission task of a next packet after executing the transmission task of the current packet;
extracting a calculation task from the global packet queue through the kernel controller, and maintaining a local packet calculation queue;
and according to the local grouping calculation queue, the kernel controller executes the calculation task of the current grouping, and after all the grouping is executed, the kernel controller generates a vector query result.
9. The unstructured dataset-based server non-awareness vector query method of claim 8, wherein the kernel controller performs the currently grouped computing tasks comprising:
Dividing each balanced vector cluster in the current grouping into a plurality of sub-vector clusters according to the characteristics of the balanced vector cluster address;
taking the calculation of the vector query request on each sub-vector cluster as a block calculation task, and averagely distributing the block calculation task to a plurality of stream processors on the GPU;
each of the stream processors performs the assigned block computation task.
10. A server non-aware vector query system based on unstructured data sets, the system comprising:
the system comprises a declarative application programming interface, a server and a server, wherein the declarative application programming interface is used for acquiring batch query requests by using the declarative application programming interface which is not perceived by the server, and the batch query requests comprise a plurality of vector query requests;
the vector database is used for searching a plurality of vector clusters corresponding to the batch query requests by utilizing an IVF index constructed on the basis of an unstructured data set offline to generate a query plan, and the query plan represents: transmitting the plurality of vector clusters from a main memory to a transmission sequence in a video memory of a GPU, and calculating a calculation sequence of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters; the balance vector cluster represents a vector cluster with a variance value smaller than a preset variance value, which is obtained by dividing the vector cluster;
The query plan optimizing module is used for optimizing the query plan, eliminating redundant transmission in the query plan and obtaining an optimized query plan;
the pipeline scheduler is used for acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence;
the pipeline scheduler is further used for grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan;
and the GPU processor is used for pushing each group into a global grouping queue for transmission and calculation according to the grouping plan to obtain a vector query result.
CN202310763804.6A 2023-06-27 2023-06-27 Non-perception vector query method and system for server based on unstructured data set Active CN116501828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310763804.6A CN116501828B (en) 2023-06-27 2023-06-27 Non-perception vector query method and system for server based on unstructured data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310763804.6A CN116501828B (en) 2023-06-27 2023-06-27 Non-perception vector query method and system for server based on unstructured data set

Publications (2)

Publication Number Publication Date
CN116501828A CN116501828A (en) 2023-07-28
CN116501828B true CN116501828B (en) 2023-09-12

Family

ID=87320593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310763804.6A Active CN116501828B (en) 2023-06-27 2023-06-27 Non-perception vector query method and system for server based on unstructured data set

Country Status (1)

Country Link
CN (1) CN116501828B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2835940A1 (en) * 2002-02-08 2003-08-15 Thomson Licensing Sa Method for execution of nearest neighbor queries in database applications using a vector request of use in indexing of video sequences and images within a multimedia database
CN113032427A (en) * 2021-04-12 2021-06-25 中国人民大学 Vectorization query processing method for CPU and GPU platform
CN114817717A (en) * 2022-04-21 2022-07-29 国科华盾(北京)科技有限公司 Search method, search device, computer equipment and storage medium
CN116166690A (en) * 2023-03-03 2023-05-26 杭州电子科技大学 Mixed vector retrieval method and device for high concurrency scene

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203383A (en) * 2021-04-13 2022-10-18 澜起科技股份有限公司 Method and apparatus for querying similarity vectors in a set of candidate vectors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2835940A1 (en) * 2002-02-08 2003-08-15 Thomson Licensing Sa Method for execution of nearest neighbor queries in database applications using a vector request of use in indexing of video sequences and images within a multimedia database
CN113032427A (en) * 2021-04-12 2021-06-25 中国人民大学 Vectorization query processing method for CPU and GPU platform
CN114817717A (en) * 2022-04-21 2022-07-29 国科华盾(北京)科技有限公司 Search method, search device, computer equipment and storage medium
CN116166690A (en) * 2023-03-03 2023-05-26 杭州电子科技大学 Mixed vector retrieval method and device for high concurrency scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
What Serverless Computing Is and Should Become: The Next Phase of Cloud Computing;JOHANN SCHLEIER-SMITH 等;Communications of the ACM;第64卷(第5期);正文第76-84页 *

Also Published As

Publication number Publication date
CN116501828A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
WO2023240845A1 (en) Distributed computation method, system and device, and storage medium
US8959138B2 (en) Distributed data scalable adaptive map-reduce framework
CN114138486B (en) Method, system and medium for arranging containerized micro-services for cloud edge heterogeneous environment
TWI547817B (en) Method, system and apparatus of planning resources for cluster computing architecture
CN109150738B (en) Industrial internet resource management method and system, readable storage medium and terminal
US20170091668A1 (en) System and method for network bandwidth aware distributed learning
CN102609303B (en) Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN110347515B (en) Resource optimization allocation method suitable for edge computing environment
CN108270805B (en) Resource allocation method and device for data processing
CN114418127B (en) Machine learning calculation optimization method and platform
CN110308984B (en) Cross-cluster computing system for processing geographically distributed data
CN106874067B (en) Parallel computing method, device and system based on lightweight virtual machine
Li et al. An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters
US20230136661A1 (en) Task scheduling for machine-learning workloads
CN110990154A (en) Big data application optimization method and device and storage medium
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
CN113448714B (en) Computing resource control system based on cloud platform
CN114327811A (en) Task scheduling method, device and equipment and readable storage medium
CN116501828B (en) Non-perception vector query method and system for server based on unstructured data set
CN115879543B (en) Model training method, device, equipment, medium and system
CN112114951A (en) Bottom-up distributed scheduling system and method
CN116996941A (en) Calculation force unloading method, device and system based on cooperation of cloud edge ends of distribution network
Wang et al. Improved intermediate data management for mapreduce frameworks
US20210397485A1 (en) Distributed storage system and rebalancing processing method
CN108228323A (en) Hadoop method for scheduling task and device based on data locality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant