CN116501828B

CN116501828B - Server-less vector query method and system based on unstructured data sets

Info

Publication number: CN116501828B
Application number: CN202310763804.6A
Authority: CN
Inventors: 金鑫; 刘譞哲; 章梓立
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-09-12
Anticipated expiration: 2043-06-27
Also published as: CN116501828A

Abstract

The invention provides a server non-perception vector query method and a system based on unstructured data sets, wherein the method is applied to the technical field of vector query, and comprises the following steps: acquiring a batch query request, wherein the batch query request comprises a plurality of vector query requests; searching a plurality of vector clusters corresponding to the batch query requests to generate a query plan; wherein each vector cluster is divided into a plurality of balanced vector clusters; optimizing the query plan, eliminating redundant transmission in the query plan, and obtaining the optimized query plan; acquiring transmission time information and calculation time information, and reordering the optimized query plan by taking the balance vector cluster as granularity to obtain an optimal execution sequence; grouping the optimal execution sequence by using a dynamic programming algorithm to obtain a grouping plan; and pushing each group into a global grouping queue for transmission and calculation according to a grouping plan, and obtaining a vector query result.

Description

Server-aware vector query method and system based on unstructured data sets

技术领域Technical Field

本发明涉及向量查询技术领域，特别是基于非结构化数据集的服务器无感知向量查询方法和系统。The present invention relates to the technical field of vector query, and in particular to a server-unaware vector query method and system based on unstructured data sets.

背景技术Background Art

向量检索是指将非结构数据转化为高维的特征向量，进行查询、计算和存储的技术。目前，向量检索技术广泛用于人脸识别、信息检索和推荐系统等人工智能领域。在现有技术中，图形处理器（Graphics Processing Unit，GPU）作为一个高度并行化的协处理器，是处理向量操作，执行向量查询任务的天然选择。由于GPU显存容量远远小于主机内存，为了更好的利用GPU的计算资源，克服其显存不够的缺陷，在实际向量检索应用中，往往将主机内存作为GPU显存的拓展存储来使用。Vector retrieval refers to the technology of converting unstructured data into high-dimensional feature vectors for query, calculation and storage. At present, vector retrieval technology is widely used in artificial intelligence fields such as face recognition, information retrieval and recommendation systems. In the existing technology, the graphics processing unit (GPU), as a highly parallelized coprocessor, is a natural choice for processing vector operations and performing vector query tasks. Since the GPU video memory capacity is much smaller than the host memory, in order to better utilize the GPU's computing resources and overcome its insufficient video memory, in actual vector retrieval applications, the host memory is often used as an extended storage of the GPU video memory.

然而，上述GPU结合主机内存的技术方案在实际的向量查询应用中，大大增加了从主机内存到GPU显存的数据传输开销，无法充分利用GPU上的计算资源，导致向量查询效率低，所需时间过长，造成一定的传输资源和计算资源的浪费。However, in actual vector query applications, the above-mentioned technical solution of combining GPU with host memory greatly increases the data transmission overhead from host memory to GPU video memory, and cannot fully utilize the computing resources on the GPU, resulting in low vector query efficiency and excessively long time required, causing a certain amount of waste of transmission resources and computing resources.

因此，有必要开发一种基于非结构化数据集的服务器无感知向量查询方法和系统，以提高向量查询效率，达到更高的计算性能和性价比。Therefore, it is necessary to develop a server-insensitive vector query method and system based on unstructured data sets to improve the efficiency of vector query and achieve higher computing performance and cost-effectiveness.

发明内容Summary of the invention

鉴于上述问题，本发明实施例提供了一种基于非结构化数据集的服务器无感知向量查询方法和系统，以便克服上述问题或者至少部分地解决上述问题。In view of the above problems, an embodiment of the present invention provides a server-unaware vector query method and system based on an unstructured data set, so as to overcome the above problems or at least partially solve the above problems.

本发明实施例第一方面提供了一种基于非结构化数据集的服务器无感知向量查询方法，所述方法包括：A first aspect of an embodiment of the present invention provides a server-unaware vector query method based on an unstructured data set, the method comprising:

获取批量查询请求，所述批量查询请求包括多个向量查询请求；Obtaining a batch query request, where the batch query request includes multiple vector query requests;

利用线下基于非结构数据集构建的IVF索引，查找得到与所述批量查询请求对应的多个向量簇，生成查询计划，所述查询计划表示：将所述多个向量簇从主存传输至GPU的显存中的传输顺序，和，在所述向量簇中计算所述向量查询请求的计算顺序；其中，每个所述向量簇被划分为多个平衡向量簇；Using an IVF index constructed offline based on an unstructured data set, multiple vector clusters corresponding to the batch query request are searched and a query plan is generated, wherein the query plan represents: a transmission order of the multiple vector clusters from the main memory to the video memory of the GPU, and a calculation order of calculating the vector query request in the vector clusters; wherein each of the vector clusters is divided into multiple balanced vector clusters;

优化所述查询计划，消除所述查询计划中的冗余传输，得到优化后查询计划；Optimizing the query plan, eliminating redundant transmissions in the query plan, and obtaining an optimized query plan;

获取传输时间信息和计算时间信息，以所述平衡向量簇为粒度，对所述优化后查询计划进行重排序，得到最优执行顺序；Acquire transmission time information and calculation time information, and reorder the optimized query plan with the balance vector cluster as the granularity to obtain the optimal execution order;

使用动态规划算法，对所述最优执行顺序进行分组，得到分组计划；Using a dynamic programming algorithm, grouping the optimal execution sequence to obtain a grouping plan;

按照所述分组计划，将每个组推入全局分组队列进行传输和计算，得到向量查询结果。According to the grouping plan, each group is pushed into the global grouping queue for transmission and calculation to obtain a vector query result.

本申请实施例第二方面还提供了一种基于非结构化数据集的服务器无感知向量查询系统，所述系统包括：The second aspect of the embodiment of the present application further provides a server-unaware vector query system based on an unstructured data set, the system comprising:

声明式应用编程接口，用于获取批量查询请求，所述批量查询请求包括多个向量查询请求；A declarative application programming interface, used to obtain a batch query request, wherein the batch query request includes a plurality of vector query requests;

向量数据库，用于利用线下基于非结构数据集构建的IVF索引，查找得到与所述批量查询请求对应的多个向量簇，生成查询计划，所述查询计划表示：将所述多个向量簇从主存传输至GPU的显存中的传输顺序，和，在所述向量簇中计算所述向量查询请求的计算顺序；其中，每个所述向量簇被划分为多个平衡向量簇；A vector database, used for searching for multiple vector clusters corresponding to the batch query request by using an IVF index constructed offline based on an unstructured data set, and generating a query plan, wherein the query plan represents: a transmission order of the multiple vector clusters from the main memory to the display memory of the GPU, and a calculation order of calculating the vector query request in the vector clusters; wherein each of the vector clusters is divided into multiple balanced vector clusters;

查询计划优化模块，用于优化所述查询计划，消除所述查询计划中的冗余传输，得到优化后查询计划；A query plan optimization module, used to optimize the query plan, eliminate redundant transmissions in the query plan, and obtain an optimized query plan;

流水线调度器，用于获取传输时间信息和计算时间信息，以所述平衡向量簇为粒度，对所述优化后查询计划进行重排序，得到最优执行顺序；A pipeline scheduler, used for obtaining transmission time information and calculation time information, and reordering the optimized query plan with the balance vector cluster as the granularity to obtain the optimal execution order;

所述流水线调度器还用于使用动态规划算法，对所述最优执行顺序进行分组，得到分组计划；The pipeline scheduler is further used to group the optimal execution order using a dynamic programming algorithm to obtain a grouping plan;

GPU处理器，用于按照所述分组计划，将每个组推入全局分组队列进行传输和计算，得到向量查询结果。The GPU processor is used to push each group into the global grouping queue for transmission and calculation according to the grouping plan to obtain the vector query result.

本发明实施例提供的一种基于非结构化数据集的服务器无感知向量查询方法和系统，该方法包括：获取批量查询请求，所述批量查询请求包括多个向量查询请求；利用线下基于非结构数据集构建的IVF索引，查找得到与所述批量查询请求对应的多个向量簇，生成查询计划，所述查询计划表示：将所述多个向量簇从主存传输至GPU的显存中的传输顺序，和，在所述向量簇中计算所述向量查询请求的计算顺序；其中，每个所述向量簇被划分为多个平衡向量簇；优化所述查询计划，消除所述查询计划中的冗余传输，得到优化后查询计划；获取传输时间信息和计算时间信息，以所述平衡向量簇为粒度，对所述优化后查询计划进行重排序，得到最优执行顺序；使用动态规划算法，对所述最优执行顺序进行分组，得到分组计划；按照所述分组计划，将每个组推入全局分组队列进行传输和计算，得到向量查询结果。一方面，本申请实施例通过对查询计划进行优化，消除查询计划中的冗余传输，提高传输效率，节省传输资源。另一方面，本申请实施例通过对查询计划进行重排序，得到最优执行顺序，利用动态规划算法进行分组，从而按照确定出的最优执行顺序和最优分组方案进行传输和计算，进一步提高了向量查询效率。The embodiment of the present invention provides a server-unaware vector query method and system based on an unstructured data set, the method comprising: obtaining a batch query request, the batch query request comprising multiple vector query requests; using an IVF index constructed offline based on an unstructured data set, finding multiple vector clusters corresponding to the batch query request, and generating a query plan, the query plan representing: the transmission order of the multiple vector clusters from the main memory to the GPU's video memory, and the calculation order of the vector query request in the vector cluster; wherein each of the vector clusters is divided into multiple balanced vector clusters; optimizing the query plan, eliminating redundant transmissions in the query plan, and obtaining an optimized query plan; obtaining transmission time information and calculation time information, reordering the optimized query plan with the balanced vector cluster as the granularity, and obtaining an optimal execution order; using a dynamic programming algorithm to group the optimal execution order to obtain a grouping plan; according to the grouping plan, pushing each group into a global grouping queue for transmission and calculation to obtain a vector query result. On the one hand, the embodiment of the present application optimizes the query plan, eliminates redundant transmissions in the query plan, improves transmission efficiency, and saves transmission resources. On the other hand, the embodiment of the present application reorders the query plan to obtain the optimal execution order, uses a dynamic programming algorithm to group, and then transmits and calculates according to the determined optimal execution order and optimal grouping scheme, thereby further improving the efficiency of vector query.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例的描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying creative labor.

图1是本发明实施例提供的一种基于非结构化数据集的服务器无感知向量查询方法的步骤流程图；FIG1 is a flowchart of a method for server-unaware vector query based on an unstructured data set provided by an embodiment of the present invention;

图2是本发明实施例提供的一种初始的查询计划的示意图；FIG2 is a schematic diagram of an initial query plan provided by an embodiment of the present invention;

图3是本发明实施例提供的一种GPU中的流处理器的计算时间和空间利用率的示意图；3 is a schematic diagram of the computing time and space utilization of a stream processor in a GPU provided by an embodiment of the present invention;

图4是本发明实施例提供的一种批量处理内部的查询计划优化结果示意图；FIG4 is a schematic diagram of a query plan optimization result within a batch process provided by an embodiment of the present invention;

图5是本发明实施例提供的一种批量处理之间的查询计划优化结果示意图；5 is a schematic diagram of a query plan optimization result between batch processing provided by an embodiment of the present invention;

图6是本发明实施例提供的一种流水线调度器的重排序示意图；FIG6 is a schematic diagram of reordering of a pipeline scheduler provided by an embodiment of the present invention;

图7是本发明实施例提供的一种动态内核拓展的计算过程示意图；FIG7 is a schematic diagram of a calculation process of dynamic kernel expansion provided by an embodiment of the present invention;

图8是本发明实施例提供的一种向量查询方法的查询过程示意图；FIG8 is a schematic diagram of a query process of a vector query method provided by an embodiment of the present invention;

图9是本发明实施例提供的基于非结构化数据集的服务器无感知向量查询系统的结构示意图；9 is a schematic diagram of the structure of a server-unaware vector query system based on an unstructured data set provided by an embodiment of the present invention;

图10是本发明实施例提供的一种电子设备的结构示意图。FIG. 10 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例，然而应当理解，可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本发明，并且能够将本发明的范围完整的传达给本领域的技术人员。The exemplary embodiments of the present invention will be described in more detail below in conjunction with the accompanying drawings in the embodiments of the present invention. Although the exemplary embodiments of the present invention are shown in the accompanying drawings, it should be understood that the present invention can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided in order to enable a more thorough understanding of the present invention and to enable the scope of the present invention to be fully communicated to those skilled in the art.

深度学习技术的突破使得非结构化数据能够被转化为高维的特征向量，向量检索也因此被用于服务广泛的人工智能应用，例如：人脸识别、信息检索和推荐系统等。随着用户数据量的急剧增长，目前很多向量检索的数据集大小已达到十亿乃至千亿级别。另外，GPU作为一个高度并行化的协处理器，它是处理向量操作，执行向量检索任务的天然选择。但是GPU不仅仅有着比CPU更高昂的价格，其显存容量也远远小于主机内存，因此在处理大规模向量数据集时，往往需要远超于所需GPU计算资源的GPU数目来放置如此大规模的数据集，导致GPU计算资源的浪费。为了更好的利用GPU的计算资源，克服其显存不够的缺陷，使得其在大规模数据集下依旧有着优于CPU的性能和性价比，最自然的做法就是利用主机内存作为显存的拓展存储。Breakthroughs in deep learning technology have enabled unstructured data to be converted into high-dimensional feature vectors, and vector retrieval has therefore been used to serve a wide range of artificial intelligence applications, such as face recognition, information retrieval, and recommendation systems. With the rapid growth of user data, the size of many vector retrieval data sets has reached billions or even hundreds of billions. In addition, as a highly parallel coprocessor, GPU is a natural choice for processing vector operations and performing vector retrieval tasks. However, GPUs are not only more expensive than CPUs, but their video memory capacity is also much smaller than host memory. Therefore, when processing large-scale vector data sets, it is often necessary to use a number of GPUs that far exceeds the required GPU computing resources to place such a large-scale data set, resulting in a waste of GPU computing resources. In order to better utilize the computing resources of GPUs and overcome the defect of insufficient video memory, so that it still has better performance and cost-effectiveness than CPUs under large-scale data sets, the most natural approach is to use host memory as an extended storage of video memory.

然而，按照上述GPU结合主机内存的技术方案，存在如下问题：一方面，在主机内存到GPU显存的数据传输中存在着严重的冗余传输。当处理一个批量(batch)的查询请求(query)时，不同的查询请求可能会在相同的数据上进行计算，如果按照原有的计算模式进行计算和数据传输，会导致在一个批量的查询中多次传输相同的数据，大大增加了数据传输的开销。另一方面，现有的向量计算引擎在处理每一个查询请求在数据上的计算时，不能充分的利用GPU的计算资源，会导致GPU计算资源的利用率较低。However, according to the above-mentioned technical solution of combining GPU with host memory, there are the following problems: On the one hand, there is serious redundant transmission in the data transmission from host memory to GPU video memory. When processing a batch of query requests, different query requests may be calculated on the same data. If the calculation and data transmission are performed according to the original calculation mode, it will lead to multiple transmissions of the same data in a batch of queries, greatly increasing the data transmission overhead. On the other hand, the existing vector computing engine cannot fully utilize the computing resources of the GPU when processing the calculation of each query request on the data, resulting in low utilization of GPU computing resources.

鉴于上述问题，本发明实施例提出一种基于非结构化数据集的服务器无感知向量查询方法和系统，以解决上述冗余传输问题和计算资源利用率低的问题，以达到更高的计算性能和性价比。下面结合附图，通过一些实施例及其应用场景对本申请实施例提供的向量查询方法进行详细地说明。In view of the above problems, the embodiments of the present invention propose a server-unaware vector query method and system based on unstructured data sets to solve the above redundant transmission problems and the problem of low computing resource utilization, so as to achieve higher computing performance and cost performance. The vector query method provided by the embodiments of the present application is described in detail below through some embodiments and their application scenarios in conjunction with the accompanying drawings.

本实施例提出了一种基于非结构化数据集的服务器无感知向量查询方法，参照图1，图1示出了一种基于非结构化数据集的服务器无感知向量查询方法的步骤流程图，如图1所示，该方法包括：This embodiment proposes a server-unaware vector query method based on an unstructured data set. Referring to FIG. 1 , FIG. 1 shows a flow chart of the steps of a server-unaware vector query method based on an unstructured data set. As shown in FIG. 1 , the method includes:

步骤S101，获取批量查询请求，所述批量查询请求包括多个向量查询请求。Step S101: obtaining a batch query request, where the batch query request includes a plurality of vector query requests.

在具体实施时，批量查询请求中包括多个向量查询请求，在实际查询过程中，需要依据批量查询请求中的每个向量查询请求进行计算，不同的向量查询请求所涉及的向量不同，或者说，所需要查询的向量不同。In specific implementation, a batch query request includes multiple vector query requests. In the actual query process, calculations need to be performed based on each vector query request in the batch query request. Different vector query requests involve different vectors, or in other words, different vectors need to be queried.

在一种可选的实施方式中，所述步骤S101，获取批量查询请求，包括：In an optional implementation, the step S101, obtaining a batch query request, includes:

步骤S1011，接收用户输入的查询请求信息，所述查询请求信息至少包括：批量查询请求的向量、搜索的精度要求、预期查询处理的时间、返回最邻近向量的个数。Step S1011, receiving query request information input by a user, wherein the query request information includes at least: vectors of a batch query request, search accuracy requirements, expected query processing time, and the number of nearest neighbor vectors to be returned.

步骤S1012，通过声明式应用编程接口，将所述查询请求信息转化为搜索配置和资源配置。Step S1012: convert the query request information into search configuration and resource configuration through a declarative application programming interface.

步骤S1013，执行所述搜索配置和所述资源配置，生成所述批量查询请求。Step S1013: execute the search configuration and the resource configuration to generate the batch query request.

在实际应用过程中，用户提交相应的作业（输入本次查询的查询请求信息），通过本实施例的声明式应用编程接口，可以自动地将用户的查询请求信息转化为具体的搜索配置和资源配置，从而执行搜索配置和资源配置，生成批量查询请求。由此，不需要用户手动为本次查询配置相应的资源，系统可以自动地生成示意的搜索配置和资源配置，进而生成批量查询请求。本实施例提出了一组“服务器无感知”的声明式应用编程接口（API），屏蔽了大量资源配置细节（如GPU数量、类型、通信方式等），开发人员只需在这组接口对其模型训练任务作简单的声明（如数据集、精度、预期完成时间），系统即可自动为其进行资源分配和调度，从而大幅度降低了模型训练任务的开发复杂度。In the actual application process, the user submits the corresponding job (inputs the query request information of this query). Through the declarative application programming interface of this embodiment, the user's query request information can be automatically converted into specific search configuration and resource configuration, thereby executing the search configuration and resource configuration, and generating batch query requests. Therefore, there is no need for the user to manually configure the corresponding resources for this query. The system can automatically generate schematic search configuration and resource configuration, and then generate batch query requests. This embodiment proposes a set of "server-unaware" declarative application programming interfaces (APIs), which shield a large number of resource configuration details (such as GPU quantity, type, communication method, etc.). Developers only need to make simple declarations on their model training tasks (such as data sets, accuracy, expected completion time) in this set of interfaces, and the system can automatically allocate and schedule resources for them, thereby greatly reducing the development complexity of model training tasks.

步骤S102，利用线下基于非结构数据集构建的IVF索引，查找得到与所述批量查询请求对应的多个向量簇，生成查询计划，所述查询计划表示：将所述多个向量簇从主存传输至GPU的显存中的传输顺序，和，在所述向量簇中计算所述向量查询请求的计算顺序；其中，每个所述向量簇被划分为多个的平衡向量簇。Step S102, using the IVF index constructed offline based on the unstructured data set, to find multiple vector clusters corresponding to the batch query request, and generate a query plan, the query plan represents: the transmission order of the multiple vector clusters from the main memory to the GPU memory, and, the calculation order of the vector query request in the vector clusters; wherein each of the vector clusters is divided into a plurality of balanced vector clusters.

在本实施例中，预先在向量数据库中构建倒排索引(inverted file system,IVF)，是指对向量数据库中的所有向量进行k-means聚类，将向量分为多个向量簇的技术。具体的，将深度学习中的非结构化数据转化为高维的特征向量，以向量的形式，将大量的非结构化数据集进行存储，得到向量数据库，通过对向量数据库中的向量的查询，实现了对非结构化数据的查询检索。基于预先构建的IVF索引，根据步骤S101获取的批量查询请求进行查询，批量查询请求中的每个向量查询请中包括一个查询向量，可以确定出距离查询向量最近的多个向量簇。由于本实施例预先用k-means聚类将提供的大规模向量数据集划分为多个向量簇，线上查询时只会搜索距离查询向量最近的多个向量簇，不需要逐个遍历所有向量，提高了查询性能，降低延迟。In this embodiment, an inverted file system (IVF) is pre-constructed in a vector database, which refers to a technique of performing k-means clustering on all vectors in the vector database to divide the vectors into multiple vector clusters. Specifically, the unstructured data in deep learning is converted into high-dimensional feature vectors, and a large number of unstructured data sets are stored in the form of vectors to obtain a vector database. By querying the vectors in the vector database, query retrieval of the unstructured data is realized. Based on the pre-constructed IVF index, a query is performed according to the batch query request obtained in step S101, and each vector query request in the batch query request includes a query vector, so that multiple vector clusters closest to the query vector can be determined. Since this embodiment pre-divided the large-scale vector data set provided into multiple vector clusters by k-means clustering, only the multiple vector clusters closest to the query vector will be searched during online query, and all vectors do not need to be traversed one by one, thereby improving query performance and reducing latency.

本实施例通过线下构建的IVF索引，找到所有向量查询请求对应的向量簇，构建一个查询向量-搜索向量簇的矩阵作为查询计划，其中每个查询向量对应一个向量查询请求。该查询计划表示：将所述多个向量簇从主存传输至GPU的显存中的传输顺序，和，在所述向量簇中计算所述向量查询请求的计算顺序。This embodiment uses the IVF index constructed offline to find the vector clusters corresponding to all vector query requests, and constructs a query vector-search vector cluster matrix as a query plan, where each query vector corresponds to a vector query request. The query plan represents: the transmission order of the multiple vector clusters from the main memory to the GPU memory, and the calculation order of the vector query requests in the vector clusters.

参照图2，图2示出了一种初始的查询计划的示意图，如图2所示，横轴的G₁、G₂和G₃表示查询计划的传输顺序，C₁、C₂、C₃、C₄、C₅、C₆表示不同的向量簇，纵轴的Query-1(Q₁)、Query-2(Q₂)、Query-3(Q₃)表示不同的向量查询请求。示例性的，设GPU的显存容量为三个向量簇，即GPU只能存储三个向量簇，根据图2所示的查询计划，可以看出第一次传输G₁是从主机内存中将C₁、C₂、C₃三个向量簇传输至GPU显存中，在C₁中计算向量查询请求Q₁，在C₂中计算Q₂，在C₃中计算Q₃，完成计算后，将三个向量簇驱逐到主机内存中；然后执行第二次传输G₂从主机内存中将C₄、C₅、C₆三个向量簇传输至GPU显存中，在C₄中计算Q₁，在C₅中计算Q₂，在C₆中计算Q₃，完成计算后，将三个向量簇驱逐到主机内存中；第三次传输G₃是从主机内存中将C₁、C₂、C₃三个向量簇再次传输至GPU显存中，在C₃中计算Q₁，在C₁中计算Q₂，在C₂中计算Q₃，完成计算后，将三个向量簇驱逐到主机内存中。从图2可以看出，该初始的查询计划包括了多个向量簇从主存传输至GPU的显存中的传输顺序（例如，先传输C₁、C₂、C₃，再传输C₄、C₅、C₆，最后再传输C₃、C₁、C₂），和，在向量簇中计算向量查询请求的计算顺序（例如在G₃中的计算顺序为：在C₃中计算Q₁，在C₁中计算Q₂，在C₂中计算Q₃）。Referring to FIG. 2 , FIG. 2 shows a schematic diagram of an initial query plan. As shown in FIG. ₂ , G ₁ , G ₂ and G ₃ on the horizontal axis represent the transmission order of the query plan, C 1 , C ₂ , C ₃ , C ₄ , C ₅ , C ₆ represent different vector clusters, and Query-1 (Q ₁ ), Query-2 (Q ₂ ), Query-3 (Q ₃ ) on the vertical axis represent different vector query requests. Exemplarily, assuming that the GPU memory capacity is three vector clusters, that is, the GPU can only store three vector clusters. According to the query plan shown in FIG2 , it can be seen that the first transmission _G1 is to transfer the three vector clusters _C1 , _C2 , and _C3 from the host memory to the GPU memory, calculate the vector query request _Q1 in _C1 , calculate _Q2 in _C2 , and calculate _Q3 in _C3 . After the calculation is completed, the three vector clusters are driven out to the host memory; then the second transmission _G2 is executed to transfer the three vector clusters _C4 , _C5 , and _C6 from the host memory to the GPU memory, calculate _Q1 in _C4 , calculate _Q2 in _C5 , and calculate _Q3 in _C6 . After the calculation is completed, the three vector clusters are driven out to the host memory; the third transmission _G3 is to transfer the three vector clusters _C1 , _C2 , and _C3 from the host memory to the GPU memory again, calculate _Q1 in _C3 , and calculate Q3 in C6. ₁ , calculate Q ₂ , calculate Q ₃ in C ₂ , and after the calculation is completed, the three vector clusters are evicted to the host memory. As can be seen from Figure 2, the initial query plan includes the transfer order of multiple vector clusters from the main memory to the GPU's video memory (for example, first transfer C ₁ , C ₂ , C ₃ , then transfer C ₄ , C ₅ , C ₆ , and finally transfer C ₃ , C ₁ , C ₂ ), and, the calculation order of the vector query request in the vector cluster (for example, the calculation order in G ₃ is: calculate Q ₁ in C ₃ , calculate Q ₂ in C ₁ , and calculate Q ₃ in C ₂ ).

在一种可选的实施方式中，所述向量簇是通过如下步骤，划分为多个平衡向量簇的，所述步骤包括：In an optional implementation, the vector cluster is divided into a plurality of balanced vector clusters by the following steps, the steps comprising:

步骤S1021，按照最小向量簇的大小，对每个所述向量簇进行划分，得到多个候选向量簇；具体的，按照k-means聚类方法构建IVF索引，得到n个向量簇，这n个向量簇的大小并不相同，从中确定出其中最小的一个向量簇。以最小向量簇的大小为单位，对每个向量簇进行划分，得到多个候选向量簇。Step S1021, according to the size of the minimum vector cluster, each vector cluster is divided to obtain multiple candidate vector clusters; specifically, IVF index is constructed according to the k-means clustering method to obtain n vector clusters, the sizes of the n vector clusters are not the same, and the smallest vector cluster is determined. Each vector cluster is divided according to the size of the minimum vector cluster to obtain multiple candidate vector clusters.

步骤S1022，确定所述候选向量簇大小的方差是否小于预设方差值。Step S1022, determining whether the variance of the candidate vector cluster size is less than a preset variance value.

步骤S1023，在所述候选向量簇大小的方差小于所述预设方差值的情况下，将所述候选向量簇确定为所述平衡向量簇。具体的，当候选向量簇的方差小于预设方差值时，表示此时的候选向量簇的大小相近，差别较小，可以作为平衡向量簇。Step S1023, when the variance of the candidate vector cluster size is less than the preset variance value, the candidate vector cluster is determined as the balanced vector cluster. Specifically, when the variance of the candidate vector cluster is less than the preset variance value, it means that the candidate vector clusters are similar in size and have little difference, and can be used as a balanced vector cluster.

步骤S1024，在所述候选向量簇大小的方差大于或等于所述预设方差值的情况下，递归将所述最小向量簇的大小减半，按照减半后的向量簇大小进行划分，生成新的候选向量簇，直至所述候选向量簇大小的方差小于所述预设方差值。具体的，当候选向量簇的方差大于或等于预设方差值，表示候选向量簇之间的大小差值仍然过大，没有收敛至预期范围内，可以按照步骤S1021中的最小向量簇的一半大小为新的划分单位，对向量簇重新进行划分，从而生成新的候选向量簇，然后再次检测候选向量簇的方差，若该方差超出预设方差值，则继续将划分单位减半，重新生成候选向量簇，直至候选向量簇的方差小于了预设方差值。Step S1024, when the variance of the candidate vector cluster size is greater than or equal to the preset variance value, recursively halve the size of the minimum vector cluster, divide it according to the halved vector cluster size, and generate a new candidate vector cluster until the variance of the candidate vector cluster size is less than the preset variance value. Specifically, when the variance of the candidate vector cluster is greater than or equal to the preset variance value, it means that the size difference between the candidate vector clusters is still too large and has not converged to the expected range. The vector cluster can be re-divided according to half the size of the minimum vector cluster in step S1021 as a new division unit to generate a new candidate vector cluster, and then the variance of the candidate vector cluster is detected again. If the variance exceeds the preset variance value, the division unit is further halved to regenerate the candidate vector cluster until the variance of the candidate vector cluster is less than the preset variance value.

在具体实施时，在线下预先构建IVF索引后，将每一个生成的向量簇划分成若干个平衡向量簇，通过计算向量簇的方差值，将方差值小于预设方差值的向量簇确定为平衡向量簇，从而实现了每一个平衡向量簇的大小几乎相同。In the specific implementation, after pre-building the IVF index offline, each generated vector cluster is divided into several balanced vector clusters. By calculating the variance value of the vector cluster, the vector cluster with a variance value less than the preset variance value is determined as the balanced vector cluster, thereby achieving that the size of each balanced vector cluster is almost the same.

在向量查询的过程中，有些常尾的计算块(block)会堵塞整个计算内核(kernel)，使得同一时间只有部分GPU的流处理器(streaming multiprocessor)被利用。参照图3，图3示出了一种GPU中的流处理器的计算时间和空间利用率的示意图，如图3所示，在实际的向量查询过程中，GPU中包括多个流处理器(streaming multiprocessor，SM）（如图3中所示的SM₁、SM₂、SM₃、SM₄），每个流处理器分别在向量簇上进行向量查询请求的计算，如图3所示，SM₁在向量簇C₁上执行向量查询请求Q₁的计算，SM₂在向量簇C₂上执行向量查询请求Q₂的计算，SM₃在向量簇C₃上执行向量查询请求Q₃的计算。需要知道的是，由于k-means聚类算法的不均衡，每一个向量簇的大小都是不一样的，向量簇大小不同，进而导致每个流处理器执行计算所需时间不同，向量簇越大，计算所需时间越长。如图3所示，横轴箭头方向表示时间轴的方向，向量簇C₁远大于C₂和C₃，导致流处理器SM₁计算的时间比SM₂和SM₃进行计算的时间更长，这导致查询请求Q₁在向量簇C₁上的执行时间拖慢整个GPU内核计算，导致流处理器SM₂和SM₃产生了空闲时间，降低了GPU中的流处理器的时间利用率。In the process of vector query, some long tail computing blocks will block the entire computing kernel, so that only some GPU stream processors (streaming multiprocessors) are used at the same time. Referring to FIG3, FIG3 shows a schematic diagram of the computing time and space utilization of a stream processor in a GPU. As shown in FIG3, in the actual vector query process, the GPU includes multiple stream processors (streaming multiprocessors, SM) (SM ₁ , SM ₂ , SM ₃ , SM ₄ as shown in FIG3), and each stream processor performs the calculation of the vector query request on the vector cluster respectively. As shown in FIG3, SM ₁ performs the calculation of the vector query request Q ₁ on the vector cluster C ₁ , SM ₂ performs the calculation of the vector query request Q ₂ on the vector cluster C ₂ , and SM ₃ performs the calculation of the vector query request Q ₃ on the vector cluster C _3. It should be noted that due to the imbalance of the k-means clustering algorithm, the size of each vector cluster is different. The different sizes of vector clusters lead to different calculation times for each stream processor. The larger the vector cluster, the longer the calculation time. As shown in Figure 3, the direction of the horizontal arrow represents the direction of the time axis. The vector cluster _C1 is much larger than _C2 and _C3 , which causes the calculation time of the stream processor _SM1 to be longer than that of _SM2 _and SM3. This causes the execution time of the query request _Q1 on the vector cluster _C1 to slow down the entire GPU kernel calculation, resulting in idle time for the stream processors _SM2 and _SM3 , reducing the time utilization of the stream processors in the GPU.

为了解决上述问题，本申请实施例提出向量簇平衡技术，通过将向量簇预先划分为多个大小相近的平衡向量簇，以平衡向量簇为粒径进行传输和计算，从而避免了查询请求因为在个别向量簇上的执行时间拖慢整个内核计算的情况，解决了GPU流处理器的时间利用率不足的问题。In order to solve the above problems, an embodiment of the present application proposes a vector cluster balancing technology, which pre-divides the vector cluster into multiple balanced vector clusters of similar size, and uses the balanced vector cluster as the granularity for transmission and calculation, thereby avoiding the situation where the query request slows down the entire kernel calculation due to the execution time on individual vector clusters, thereby solving the problem of insufficient time utilization of the GPU stream processor.

步骤S103，优化所述查询计划，消除所述查询计划中的冗余传输，得到优化后查询计划。Step S103, optimizing the query plan, eliminating redundant transmissions in the query plan, and obtaining an optimized query plan.

按照原始的查询计划，容易产生冗余传输，如图2所示，第一次传输G₁是从主机内存中将C₁、C₂、C₃三个向量簇传输至GPU显存中，第二次传输G₂是从主机内存中将C₄、C₅、C₆三个向量簇传输至GPU显存中，第三次传输G₃是从主机内存中将C₁、C₂、C₃三个向量簇再次传输至GPU显存中，可以得出在上述过程中累积传输了9次向量簇，但是交换G₂和G₃的顺序后仅会传输6次向量簇，该初始的查询计划产生了3次的冗余传输，导致C₁、C₂、C₃向量簇被重复多次从主存传输至GPU显存中，浪费了传输资源，影响整体的向量查询效率。本申请实施例提出对查询计划进行优化，消除其中的冗余传输，得到优化后查询计划，使优化后查询计划避免了重复传输同一向量簇，提高了向量簇的传输计算效率。According to the original query plan, redundant transmission is easy to occur. As shown in FIG2 , the first transmission _G1 is to transfer the three vector clusters _C1 , _C2 , and _C3 from the host memory to the GPU memory, the second transmission _G2 is to transfer the three vector clusters _C4 , _C5 , and _C6 from the host memory to the GPU memory, and the third transmission _G3 is to transfer the three vector clusters _C1 , _C2 , and _C3 from the host memory to the GPU memory again. It can be concluded that the vector clusters are cumulatively transmitted 9 times in the above process, but only 6 vector clusters will be transmitted after the order of _G2 and _G3 is exchanged. The initial query plan generates 3 redundant transmissions, resulting in the _C1 , _C2 , and _C3 vector clusters being repeatedly transmitted from the main memory to the GPU memory, wasting transmission resources and affecting the overall vector query efficiency. The embodiment of the present application proposes to optimize the query plan, eliminate the redundant transmission therein, and obtain the optimized query plan, so that the optimized query plan avoids repeated transmission of the same vector cluster and improves the transmission and calculation efficiency of the vector cluster.

在一种可选的实施方式中，所述步骤S103，优化所述查询计划，消除所述查询计划中的冗余传输，包括：In an optional implementation, the step S103, optimizing the query plan to eliminate redundant transmission in the query plan, includes:

步骤S1031，将每个所述向量簇的传输次数调整为最多传输一次。Step S1031, adjusting the number of transmissions of each of the vector clusters to at most once.

步骤S1032，将已经存在于所述GPU的显存中的向量簇的计算顺序向前调整。Step S1032: Adjust the calculation order of the vector clusters that already exist in the video memory of the GPU forward.

在向量查询处理中，每个子集（即每一次在一个向量簇上计算向量查询请求）的处理都是独立的，基于此，能够改进每个查询请求的查询计划，而不改变其查询结果的正确性。本实施例从批量处理内部和批量处理之间两个方面改进查询计划，以完全消除批量查询请求的冗余传输。In vector query processing, the processing of each subset (i.e., each time a vector query request is calculated on a vector cluster) is independent, based on which the query plan of each query request can be improved without changing the correctness of its query result. This embodiment improves the query plan from two aspects: within the batch processing and between batch processing, so as to completely eliminate the redundant transmission of the batch query request.

一方面进行批量处理内部的优化，对应步骤S1031。参照图4，图4示出了一种批量处理内部的查询计划优化结果示意图，如图4所示出的示例，该示例中的向量查询总共会涉及6个向量簇：C₁、C₂、C₃、C₄、C₅、C₆，批量查询请求中包括3个向量查询请求：Q₁、Q₂、Q₃，设图4中的矩阵为M，矩阵M中的每个元素[i，j]表示在对应的向量簇C_j中计算对应的向量查询请求Q_i的计算结果，0表示该向量簇C_j不被向量查询请求Q_i所涉及，即向量簇C_j不是该向量查询请求Q_i所需要的向量；1表示该向量簇C_j被向量查询请求Q_i所涉及，即向量簇C_j是该向量查询请求Q_i所需要的向量。On the one hand, optimization is performed inside the batch processing, corresponding to step S1031. Referring to FIG4 , FIG4 shows a schematic diagram of query plan optimization results inside the batch processing. In the example shown in FIG4 , the vector query in the example involves a total of 6 vector clusters: C ₁ , C ₂ , C ₃ , C ₄ , C ₅ , and C ₆ , and the batch query request includes 3 vector query requests: Q ₁ , Q ₂ , and Q ₃ . Let the matrix in FIG4 be M, and each element [i, j] in the matrix M represents the calculation result of the corresponding vector query request _Qi in the corresponding vector cluster C _j . 0 indicates that the vector cluster C _j is not involved by the vector query request _Qi , that is, the vector cluster C _j is not the vector required by the vector query request _Qi ; 1 indicates that the vector cluster C _j is involved by the vector query request _Qi , that is, the vector cluster C _j is the vector required by the vector query request _Qi .

基于如下观察：查询的最优传输次数总是不少于所涉及的向量簇的数目，所以，优化目标为改进原有的查询计划，使得改进后的查询计划的总传输次数为向量簇的数目并保证结果的正确性。当传输一个向量簇进入GPU显存后，改进后的查询计划会立即处理涉及当前向量簇的查询请求，通过这种方式，优化后查询计划只会传输每个向量簇至多一次，从而消除了冗余的数据传输，实现了批量处理内部的查询计划优化。Based on the following observation: the optimal number of query transmissions is always no less than the number of vector clusters involved, the optimization goal is to improve the original query plan so that the total number of transmissions of the improved query plan is the number of vector clusters and the correctness of the results is guaranteed. After a vector cluster is transmitted into the GPU memory, the improved query plan will immediately process the query request involving the current vector cluster. In this way, the optimized query plan will only transmit each vector cluster at most once, thereby eliminating redundant data transmission and realizing the query plan optimization within the batch processing.

另一方面，进行批量处理请求之间的优化，对应步骤S1032。参照图5，图5示出了一种批量处理之间的查询计划优化结果示意图，如图5所示，假设GPU显存容量为3，最初容纳了两个向量簇：C₅和C₆，即在上一批次处理时，将C₅和C₆传输至了GPU显存中，而恰好下一批次处理时，同样需要对C₅和C₆进行处理。在此情况下，批量处理请求之间的优化可以将已经存在于GPU的显存中的向量簇的计算顺序向前调整，具体可以调整至处理顺序的首位，即将C₅和C₆的处理顺序移动到整个查询请求处理顺序的前面。这样可以充分的重复使用先前批次中已经传输至GPU显存中的向量簇，进一步减少了传输所需开销。On the other hand, optimization is performed between batch processing requests, corresponding to step S1032. Referring to FIG5 , FIG5 shows a schematic diagram of query plan optimization results between batch processing. As shown in FIG5 , assuming that the GPU video memory capacity is 3, it initially accommodates two vector clusters: C ₅ and C ₆ , that is, when the previous batch is processed, C ₅ and C ₆ are transferred to the GPU video memory, and when the next batch is processed, C ₅ and C ₆ also need to be processed. In this case, the optimization between batch processing requests can adjust the calculation order of the vector clusters that already exist in the GPU video memory forward, specifically, it can be adjusted to the first place in the processing order, that is, the processing order of C ₅ and C ₆ is moved to the front of the entire query request processing order. In this way, the vector clusters that have been transferred to the GPU video memory in the previous batch can be fully reused, further reducing the overhead required for transmission.

步骤S104，获取传输时间信息和计算时间信息，以所述平衡向量簇为粒度，对所述优化后查询计划进行重排序，得到最优执行顺序；Step S104, acquiring transmission time information and calculation time information, and reordering the optimized query plan with the balance vector cluster as the granularity to obtain the optimal execution order;

在本实施例中，为了获取传输时间信息和计算时间信息，预先在线下通过查询历史或预处理一些查询构建分析器部件，该分析器可以线上预测每一个查询的数据传输时间或者计算时间，即根据优化后的查询计划，该分析器可以计算预估该批量查询请求在传输数据上所需要消耗的时间，和，在GPU计算上所需要消耗的时间。根据获取到的时间信息，可以进一步的对优化后查询计划进行重排序，以得到最优执行顺序。In this embodiment, in order to obtain the transmission time information and the computing time information, an analyzer component is constructed offline in advance by querying the history or preprocessing some queries. The analyzer can predict the data transmission time or computing time of each query online, that is, according to the optimized query plan, the analyzer can calculate and estimate the time required for the batch query request to transmit data and the time required for GPU computing. According to the obtained time information, the optimized query plan can be further reordered to obtain the optimal execution order.

在一种可选的实施方式中，所述获取传输时间信息和计算时间信息，包括：In an optional implementation manner, the acquiring the transmission time information and the calculation time information includes:

按照如下公式计算每个所述向量簇的所述传输时间信息：The transmission time information of each vector cluster is calculated according to the following formula:

传输时间=，其中，是所述向量簇内的所述平衡向量簇的大小，a和b为预先通过最小二乘法拟合得到的传输时间参数，m表示所述向量簇内的所述平衡向量簇的数量；Transfer time = ,in, is the size of the equilibrium vector cluster within the vector cluster, a and b are transmission time parameters pre-fitted by the least square method, and m represents the number of the equilibrium vector clusters within the vector cluster;

按照如下公式计算每个所述向量簇的所述计算时间信息：The calculation time information of each vector cluster is calculated according to the following formula:

计算时间=，其中，A和B为预先通过最小二乘法拟合得到的计算时间参数；Compute time = , where A and B are the calculation time parameters obtained in advance by least squares fitting;

其中，所述传输时间信息表示每个所述向量簇从所述主存传输至所述GPU的显存所需的时间信息，所述计算时间信息表示在每个所述向量簇中计算对应的查询请求所需的时间信息。The transmission time information represents the time information required for each of the vector clusters to be transmitted from the main memory to the video memory of the GPU, and the calculation time information represents the time information required for calculating the corresponding query request in each of the vector clusters.

在本实施例中，分析器主要用于预测主存到GPU显存通过PCIe传输数据的时间，该传输一般通过调用cuda的接口cudaMemcpyAsync实现。整个传输时间可以被划分为两个部分：真正在PCIe上的传播时间和接口调用开销。具体的，假设当前分组包含个m平衡向量簇，那么传输时间大概可被如下的公式计算：，其中是平衡向量簇的大小，a和b可以在线下通过最小二乘法拟合，即通过历史数据或经验设置的计算时间参数。分析器预测计算时间同上，按照如下公式计算：，其中，A和B为预先通过最小二乘法拟合得到的计算时间参数，其计算时间与计算量成线性函数关系。In this embodiment, the analyzer is mainly used to predict the time of data transmission from main memory to GPU memory through PCIe, which is generally implemented by calling cudaMemcpyAsync, the interface of cuda. The entire transmission time can be divided into two parts: the actual propagation time on PCIe and the interface call overhead. Specifically, assuming that the current group contains m balanced vector clusters, the transmission time can be roughly calculated by the following formula: ,in is the size of the equilibrium vector cluster, a and b can be fitted offline by the least squares method, that is, the calculation time parameters set by historical data or experience. The analyzer predicts the calculation time as above, calculated according to the following formula: , where A and B are the calculation time parameters obtained in advance by least squares fitting, and the calculation time is in a linear function relationship with the calculation amount.

本实施例利用分析器预测每一个分组传输和计算的时间，即预测每个向量簇（包括多个平衡向量簇）的传输和计算的时间，预测得到的时间信息（传输时间信息和计算时间信息）会被流水线调度器使用，辅助计算出最优的流水线计划。具体的，流水线调度器在接收到优化后查询计划后，会针对该计划，进行进一步的重排序和分组，以达到最大的传输计算重叠和最小的流水线开销。最简单的方法就是枚举所有的流水线调度方案，在其中找到运行时间最小的，但是该算法是运行时的枚举会带来极大的算法开销。因此，本申请实施例将整个搜索问题拆成了两个子问题，先执行步骤S104，以平衡向量簇为粒度，对所述优化后查询计划进行重排序，得到最优执行顺序。在平衡向量簇的粒度上确定最优的执行顺序之后，执行步骤S105，在该顺序上搜寻最优的分组计划。This embodiment uses an analyzer to predict the time of transmission and calculation of each group, that is, to predict the time of transmission and calculation of each vector cluster (including multiple balanced vector clusters). The predicted time information (transmission time information and calculation time information) will be used by the pipeline scheduler to assist in calculating the optimal pipeline plan. Specifically, after receiving the optimized query plan, the pipeline scheduler will further reorder and group the plan to achieve the maximum transmission calculation overlap and the minimum pipeline overhead. The simplest method is to enumerate all pipeline scheduling schemes and find the one with the smallest running time, but the enumeration of the algorithm during operation will bring great algorithm overhead. Therefore, the embodiment of the present application splits the entire search problem into two sub-problems. First, step S104 is executed to reorder the optimized query plan with the balanced vector cluster as the granularity to obtain the optimal execution order. After determining the optimal execution order at the granularity of the balanced vector cluster, step S105 is executed to search for the optimal grouping plan in this order.

在一种可选的实施方式中，所述步骤S104，获取传输时间信息和计算时间信息，以所述平衡向量簇为粒度，对所述优化后查询计划进行重排序，包括：In an optional implementation, the step S104, obtaining the transmission time information and the calculation time information, and reordering the optimized query plan with the balance vector cluster as the granularity, includes:

步骤S1041，根据所述传输时间信息，将传输时间为零的平衡向量簇的执行顺序移动至首位；Step S1041, according to the transmission time information, the execution order of the balance vector cluster with a transmission time of zero is moved to the first place;

步骤S1042，根据所述计算时间信息，将所需计算时间更大的平衡向量簇的执行顺序向前调整。Step S1042: According to the calculation time information, the execution order of the balance vector cluster requiring a larger calculation time is adjusted forward.

在具体实施时，参照图6，图6示出了一种流水线调度器的重排序示意图，设所有的平衡向量簇为{B₁，B₂，B₃，……B_m}，代表一个具体的执行顺序，如图6所示，总共有四个平衡向量簇B₁、B₂、B₃、B₄，PCIe表示传输数据，GPU表示对应的内核计算，横向条为时间轴，横向条越长表示该平衡向量簇的传输时间或计算时间越长。如图6中（a）部分所示，B₄已经在GPU中不需要被传输，执行顺序为B₁→B₂→B₃→B₄。每个平衡向量簇需要在完成传输后，才能开始进行计算。In the specific implementation, refer to FIG6 , which shows a reordering schematic diagram of a pipeline scheduler. Assume that all balance vector clusters are {B ₁ , B ₂ , B ₃ , ... B _m }, representing a specific execution order. As shown in FIG6 , there are four balance vector clusters B ₁ , B ₂ , B ₃ , and B _{4 . PCIe represents transmission data, and GPU represents the corresponding kernel calculation. The horizontal bar is the time axis. The longer the horizontal bar, the longer the transmission time or calculation time of the balance vector cluster. As shown in part (a) of FIG6 , B 4} _is already in the GPU and does not need to be transmitted. The execution order is B ₁ →B ₂ →B ₃ →B ₄ . Each balance vector cluster needs to be transmitted before it can start calculation.

按照步骤S1041，根据传输时间信息，将传输时间为零的平衡向量簇的执行顺序移动至首位，从而可以增大传输和计算的重叠和并行。如图6中的（b）部分所示，将传输时间为零的平衡向量簇（B₄）的处理顺序移动至首位，从而使得B₄的计算时间可以完全被B₁的传输时间所覆盖。According to step S1041, according to the transmission time information, the execution order of the balance vector cluster with zero transmission time is moved to the first place, so that the overlap and parallelism of transmission and calculation can be increased. As shown in part (b) of Figure 6, the processing order of the balance vector cluster with zero transmission time ( _B4 ) is moved to the first place, so that the calculation time of _B4 can be completely covered by the transmission time of _B1 .

按照步骤S1042，根据计算时间信息，将所需计算时间更大的平衡向量簇的执行顺序向前调整，将计算时间更大的平衡向量簇移动到传输顺序的首位以及执行顺序的前方。如图6中的（c）部分所示，由于在该分组中，B₃的计算时间更长，将B₃移动到传输顺序的首位，首先将其传输至GPU显存中，这样将其的执行顺序也向前移动，可选地，可使其位于，步骤S1041调整的传输时间为零的平衡向量簇的执行顺序的后一位（例如图6中的B₄的后一位），这样可以使得所需计算时间更大的平衡向量簇（B₃）的计算时间可以更多的被其他平衡向量簇的传输时间所覆盖。上述步骤S1041和步骤S1042，可以通过对应的排序算法实现，以求解得到最优执行顺序。According to step S1042, according to the calculation time information, the execution order of the balance vector cluster with a larger calculation time is adjusted forward, and the balance vector cluster with a larger calculation time is moved to the first position of the transmission order and the front of the execution order. As shown in part (c) of Figure 6, since in this group, the calculation time of B ₃ is longer, B ₃ is moved to the first position of the transmission order and first transmitted to the GPU memory, so that its execution order is also moved forward. Optionally, it can be placed after the execution order of the balance vector cluster with a zero transmission time adjusted in step S1041 (for example, after B ₄ in Figure 6), so that the calculation time of the balance vector cluster (B ₃ ) with a larger calculation time can be covered more by the transmission time of other balance vector clusters. The above steps S1041 and S1042 can be implemented by a corresponding sorting algorithm to solve and obtain the optimal execution order.

现有的传输和计算的并行模式会导致流水线中出现气泡(bubble)，使得计算和传输不能充分的重叠。例如图6中的（a）部分，GPU中的流处理器需要等待B₁通过PCIe传输完成后，才能开始进行计算，由此产生的等待时间即为气泡。为了解决该问题，本实施例通过调整每一个向量簇中的平衡向量簇的执行顺序（传输顺序和计算顺序），计算得到最优执行顺序，使得平衡向量簇的计算时间能够尽可能的被其他平衡向量簇的传输时间所覆盖，减少其中的气泡，提高向量查询效率。The existing parallel mode of transmission and calculation will cause bubbles in the pipeline, so that the calculation and transmission cannot be fully overlapped. For example, in part (a) of Figure 6, the stream processor in the GPU needs to wait for _B1 to be transmitted through PCIe before it can start calculation. The waiting time generated is the bubble. In order to solve this problem, this embodiment adjusts the execution order (transmission order and calculation order) of the balanced vector clusters in each vector cluster, calculates the optimal execution order, so that the calculation time of the balanced vector cluster can be covered by the transmission time of other balanced vector clusters as much as possible, reducing bubbles and improving vector query efficiency.

步骤S105，使用动态规划算法，对所述最优执行顺序进行分组，得到分组计划。Step S105: Use a dynamic programming algorithm to group the optimal execution sequence to obtain a grouping plan.

在平衡向量簇的粒度上确定最优的执行顺序之后，流水线调度器在该顺序基础上，利用动态规划算法搜寻最优的分组计划。在相关技术中，基本的流水线策略是按向量簇一个一个进行流水线处理，即逐个簇的进行传输和计算。然而，这种细粒度流水线处理引入了两类流水线开销。第一个是频繁调用传输接口的开销，由于传输数据量是固定的，按向量簇粒度导致最大的接口调用次数，因此具有最大的调用开销。另一个开销是每个分组传输和计算之间的同步开销，按簇粒度需要为每个向量簇进行同步，具有最大的同步开销。为了减少上述两个方面的开销，本实施例提出使用动态规划算法在可能的方案组合中找到最优的分组方案。After determining the optimal execution order at the granularity of the balanced vector cluster, the pipeline scheduler searches for the optimal grouping plan based on this order using a dynamic programming algorithm. In the related art, the basic pipeline strategy is to perform pipeline processing one by one according to the vector clusters, that is, to transmit and calculate cluster by cluster. However, this fine-grained pipeline processing introduces two types of pipeline overhead. The first is the overhead of frequently calling the transmission interface. Since the amount of data transmitted is fixed, the vector cluster granularity leads to the maximum number of interface calls, and therefore has the largest call overhead. Another overhead is the synchronization overhead between each packet transmission and calculation. At the cluster granularity, each vector cluster needs to be synchronized, which has the largest synchronization overhead. In order to reduce the overhead in the above two aspects, this embodiment proposes to use a dynamic programming algorithm to find the optimal grouping scheme among possible scheme combinations.

在一种可选的实施方式中，所述步骤S105，使用动态规划算法，对所述最优执行顺序进行分组，得到分组计划，包括：In an optional implementation, the step S105, using a dynamic programming algorithm to group the optimal execution order to obtain a grouping plan, includes:

步骤S1051，使用动态规划算法，遍历搜索树的节点，每个所述节点表征一种分组方案，每个节点的子节点表示所述分组方案的子方案。Step S1051, using a dynamic programming algorithm, traverse the nodes of the search tree, each of the nodes represents a grouping scheme, and the child nodes of each node represent the sub-schemes of the grouping scheme.

在具体实施时，利用动态规划算法维护一个全局变量——最优时间，用于记录当前搜索空间中的流水线方案的最佳时间。将每一种流水线方案作为搜索树中的一个节点，例如，将“将B₃移动到传输顺序的首位”作为一个搜索树的节点，然后将“将B₃移动到传输顺序的首位，将B₂设为B₃的传输顺序的后一位”作为该节点的一个子节点。In the specific implementation, a global variable, optimal time, is maintained using a dynamic programming algorithm to record the optimal time of the pipeline solution in the current search space. Each pipeline solution is taken as a node in the search tree. For example, "Move B ₃ to the first position in the transmission order" is taken as a node in the search tree, and then "Move B ₃ to the first position in the transmission order and set B ₂ to the next position in the transmission order of B ₃ " is taken as a child node of the node.

步骤S1052，通过完全重叠剩余传输和计算，忽略流水线处理开销，计算得到执行当前节点的分组方案预计所需的时间区间。Step S1052, by completely overlapping the remaining transmission and calculation, ignoring the pipeline processing overhead, calculate the time interval expected to be required to execute the grouping solution of the current node.

具体的，在遍历搜索树中的节点时，对于每个节点预估对应的分组方案执行所需的时间区间，该时间区间表示执行该分组方案预计所需的最长时间和最短时间。在本实施例中，可以通过完全重叠剩余传输和计算，并忽略流水线处理开销来计算出当前子树中方案对应的时间区间。Specifically, when traversing the nodes in the search tree, the time interval required for executing the corresponding grouping scheme is estimated for each node, and the time interval represents the maximum time and the shortest time expected to execute the grouping scheme. In this embodiment, the time interval corresponding to the scheme in the current subtree can be calculated by completely overlapping the remaining transmission and calculation and ignoring the pipeline processing overhead.

步骤S1053，在所述时间区间的最小值，大于其他节点的时间区间的最小值的情况下，将所述当前节点及其子节点进行剪枝。Step S1053, when the minimum value of the time interval is greater than the minimum value of the time interval of other nodes, prune the current node and its child nodes.

在本实施例中，在动态规划算法中结合了启发式的剪枝方法，如果该节点的时间区间的最小值大于了维护的最优时间（即其他节点的时间区间的最小值），则该节点以及该节点所对应的所有子节点（子树）被剪枝，不需要在继续计算该节点的所有子节点，进一步减少动态规划的搜索空间，节省了计算资源。In this embodiment, a heuristic pruning method is combined with the dynamic programming algorithm. If the minimum value of the time interval of the node is greater than the optimal maintenance time (i.e., the minimum value of the time interval of other nodes), the node and all child nodes (subtrees) corresponding to the node are pruned, and there is no need to continue calculating all child nodes of the node, further reducing the search space of dynamic programming and saving computing resources.

步骤S1054，将剪枝后所述搜索树中最终剩余的节点所对应的分组方案，确定为所述分组计划。动态规划算法的时间复杂度是多项式级别，带有剪枝启发式的算法可以在运行时快速找到最优的分组方案，最小化流水线带来的额外系统开销。Step S1054, the grouping scheme corresponding to the nodes finally remaining in the search tree after pruning is determined as the grouping plan. The time complexity of the dynamic programming algorithm is at the polynomial level. The algorithm with pruning heuristic can quickly find the optimal grouping scheme at runtime and minimize the additional system overhead caused by the pipeline.

步骤S106，按照所述分组计划，将每个组推入全局分组队列进行传输和计算，得到向量查询结果。Step S106: according to the grouping plan, each group is pushed into a global grouping queue for transmission and calculation to obtain a vector query result.

在一种可选的实施方式中，所述步骤S106，按照所述分组计划，将每个组推入全局分组队列进行传输和计算，得到向量查询结果，包括：In an optional implementation, the step S106, according to the grouping plan, pushes each group into a global grouping queue for transmission and calculation to obtain a vector query result, including:

步骤S1061，通过存储管理器从所述全局分组队列中提取传输任务，维护本地分组传输队列；Step S1061, extracting transmission tasks from the global packet queue through the storage manager, and maintaining a local packet transmission queue;

步骤S1062，根据所述本地分组传输队列，所述存储管理器在执行完成当前分组的传输任务后，通知内核控制器执行对应的计算任务，并开始执行下一分组的传输任务；Step S1062, according to the local packet transmission queue, after completing the transmission task of the current packet, the storage manager notifies the kernel controller to execute the corresponding computing task and starts to execute the transmission task of the next packet;

步骤S1063，通过所述内核控制器从所述全局分组队列中提取计算任务，维护本地分组计算队列；Step S1063, extracting computing tasks from the global grouping queue through the kernel controller, and maintaining a local grouping computing queue;

步骤S1064，根据所述本地分组计算队列，所述内核控制器执行当前分组的计算任务，在执行完成所有分组后，所述内核控制器生成向量查询结果。Step S1064: the core controller executes the calculation task of the current group according to the local group calculation queue, and after completing the execution of all groups, the core controller generates a vector query result.

在一种可能的实施方式中，所述步骤S1064，内核控制器执行当前分组的计算任务，包括：In a possible implementation, in step S1064, the kernel controller executes the computing task of the current group, including:

根据平衡向量簇地址的特征，将当前分组中的每个所述平衡向量簇划分为多个子向量簇；According to the characteristics of the balance vector cluster address, each of the balance vector clusters in the current group is divided into a plurality of sub-vector clusters;

将所述向量查询请求在每个所述子向量簇上的计算作为一个块计算任务，将所述块计算任务平均分配给所述GPU上的多个流处理器；The calculation of the vector query request on each of the sub-vector clusters is regarded as a block calculation task, and the block calculation tasks are evenly distributed to multiple stream processors on the GPU;

每个所述流处理器执行分配的所述块计算任务。Each of the stream processors executes the assigned block computing task.

在进行向量查询时，不仅存在时间利用率低的问题，还会存在空间利用率低的问题。具体的，GPU中存在多个流处理器，容易出现整个计算内核的计算块的数目较小，无法占满GPU的所有流处理器的情况，即，当前批次的查询请求对应的内核计算没法用满所有流处理器，如图3中的SM₄所示，会存在个别流处理器（SM₄）处于空闲状态，导致空间的低利用率。When performing vector queries, there is not only the problem of low time utilization, but also the problem of low space utilization. Specifically, there are multiple stream processors in the GPU, and it is easy for the number of computing blocks of the entire computing kernel to be small and unable to fully occupy all the stream processors of the GPU, that is, the kernel calculation corresponding to the query request of the current batch cannot fully utilize all the stream processors, as shown in SM ₄ in Figure 3, and some stream processors (SM ₄ ) will be idle, resulting in low space utilization.

为了解决上述空间利用率低的问题，本实施例提出了动态内核拓展的方法：利用了平衡后连续的向量簇地址的特征，即平衡向量簇内的向量的地址是连续的这一特征，在运行时动态的将一个平衡向量簇快速的划分成为若干个子向量簇。原本的一个块计算（block），即一个向量查询请求在一个平衡向量簇上的计算，会被同样划分为若干个更小的块计算（block），每个块计算都会在一个流处理器中运行，每个流处理器可以容纳多个块计算运行。参照图7，图7示出了一种动态内核拓展的计算过程示意图，如图7所示，横轴为时间轴发展方向，每一个块表示一个子向量簇的计算，将划分后的若干个子向量簇平均分配给GPU中的流处理器，从而保证每个流处理器的任务量相近，实现了动态的内核拓展。In order to solve the problem of low space utilization mentioned above, this embodiment proposes a method for dynamic kernel expansion: utilizing the feature of the continuous vector cluster address after balancing, that is, the address of the vector in the balanced vector cluster is continuous, and dynamically dividing a balanced vector cluster into several sub-vector clusters at runtime. The original block calculation (block), that is, the calculation of a vector query request on a balanced vector cluster, will also be divided into several smaller block calculations (blocks), each block calculation will be run in a stream processor, and each stream processor can accommodate multiple block calculations. Referring to Figure 7, Figure 7 shows a schematic diagram of the calculation process of dynamic kernel expansion. As shown in Figure 7, the horizontal axis is the direction of the time axis development, and each block represents the calculation of a sub-vector cluster. The divided sub-vector clusters are evenly distributed to the stream processors in the GPU, thereby ensuring that the task volume of each stream processor is similar, and realizing dynamic kernel expansion.

本申请实施例在内核控制器根据本地分组计算队列，执行当前分组的计算任务之前，将当前分组中的平衡向量簇划分为若干个子向量簇，根据当前流处理器的繁忙程度，将若干个子向量簇平均分配给GPU中的多个流处理器中，使流处理器进行计算，从而保证了不会存在空闲的流处理器，提高了GPU的占用率。In the embodiment of the present application, before the kernel controller executes the calculation task of the current group according to the local group calculation queue, the balance vector cluster in the current group is divided into a plurality of sub-vector clusters, and according to the busyness of the current stream processor, the plurality of sub-vector clusters are evenly distributed to a plurality of stream processors in the GPU, so that the stream processors perform calculations, thereby ensuring that there are no idle stream processors and improving the GPU occupancy rate.

示例一、Example 1:

参照图8，图8示出了一种向量查询方法的查询过程示意图，如图8所示，该向量查询方法主要分为线上和线下两个部分。Referring to FIG. 8 , FIG. 8 shows a schematic diagram of a query process of a vector query method. As shown in FIG. 8 , the vector query method is mainly divided into two parts: online and offline.

对于线下部分，需要预先为数据集构建IVF(倒排表)索引，IVF使用k-means聚类将数据集的向量分成多个向量簇。并且，对于生成的多个向量簇，通过向量簇平衡模块扩展索引，将每个向量簇划分为多个大小相近的平衡向量簇。数据集随后被加载到主机内存中，并在存储管理系统的控制下在线通过PCIe传输到GPU显存中。For the offline part, it is necessary to build an IVF (inverted index) index for the dataset in advance. IVF uses k-means clustering to divide the vectors of the dataset into multiple vector clusters. In addition, for the generated multiple vector clusters, the vector cluster balancing module expands the index and divides each vector cluster into multiple balanced vector clusters of similar size. The dataset is then loaded into the host memory and transferred to the GPU memory online through PCIe under the control of the storage management system.

此外，本实施例现在还预先设计了一个分析器部件，该分析器在线下通过查询历史或预处理一些查询构建，可以用于线上预测每一个查询的数据传输时间或计算时间。In addition, the present embodiment now also pre-designs an analyzer component, which is constructed offline by querying history or pre-processing some queries, and can be used to predict the data transmission time or calculation time of each query online.

对于线上部分，按照向量查询方法，具体步骤如下：For the online part, according to the vector query method, the specific steps are as follows:

步骤1：接收来自用户的在线的批量查询请求。具体的，可以通过声明式应用编程接口将用户输入的查询请求信息转化为对应的搜索配置和资源配置，生成批量查询请求，Step 1: Receive online batch query requests from users. Specifically, the query request information input by the user can be converted into corresponding search configuration and resource configuration through the declarative application programming interface to generate batch query requests.

步骤2：通过线下构建的IVF索引，选取最近的n个向量簇作为查询的数据，生成查询计划。Step 2: Use the IVF index built offline to select the nearest n vector clusters as query data and generate a query plan.

步骤3：对整个查询计划进行基于向量簇的优化，消除查询计划中的冗余传输，得到优化后查询计划。Step 3: Optimize the entire query plan based on vector clusters to eliminate redundant transmissions in the query plan and obtain an optimized query plan.

步骤4：运行时流水线调度器接收改进后的查询计划，以及，分析器提供的传输时间信息和计算时间信息。流水线调度器根据上述信息，调度程序使用贪心算法重新排序查询计划，得到最优执行顺序。并且，流水线调度器还可以使用动态规划算法对最优执行顺序进行分组，得到分组计划，流水线调度器会找到流水线效率和流水线开销之间的最佳权衡。Step 4: During runtime, the pipeline scheduler receives the improved query plan, as well as the transmission time information and computation time information provided by the analyzer. Based on the above information, the pipeline scheduler uses a greedy algorithm to reorder the query plan to obtain the optimal execution order. In addition, the pipeline scheduler can also use a dynamic programming algorithm to group the optimal execution order to obtain a grouped plan. The pipeline scheduler will find the best trade-off between pipeline efficiency and pipeline overhead.

步骤5：将分组计划中的每个组推入全局分组队列进行传输和计算。Step 5: Push each group in the grouping plan into the global grouping queue for transmission and calculation.

步骤6：内核控制器和存储管理器各自从全局组队列中提取任务，维护对应的本地分组队列。其中，只要其本地分组队列不为空，存储管理器就立即开始传输，在完成当前分组的传输后，存储管理器会通知内核控制器进行相应的计算，并弹出下一个分组开始传输。如果此时内核计算引擎为空，内核控制器会计算当前分组，并采用动态的内核拓展，将平衡向量簇划分为多个子向量簇，充分利用每一个GPU流处理器。Step 6: The kernel controller and storage manager each extract tasks from the global group queue and maintain the corresponding local group queue. As long as its local group queue is not empty, the storage manager will start transmission immediately. After completing the transmission of the current group, the storage manager will notify the kernel controller to perform the corresponding calculation and pop up the next group to start transmission. If the kernel calculation engine is empty at this time, the kernel controller will calculate the current group and use dynamic kernel expansion to divide the balanced vector cluster into multiple sub-vector clusters to make full use of each GPU stream processor.

步骤7：在批量处理中完成所有分组的计算后，生成最终的向量查询结果，内核控制器将向量查询结果返回给用户。Step 7: After completing the calculation of all groups in batch processing, the final vector query result is generated, and the kernel controller returns the vector query result to the user.

本申请实施例提出了基于数据和向量簇的查询计划优化来解决冗余的数据传输问题，通过动态的计算内核拓展附加数据簇的平衡分别解决了GPU的流处理器在空间和时间上未充分利用的问题，还提出利用流水线调度器，在运行时重排序和分组，最大化传输和计算的并行，从而使得本申请实施例能够超越现有的相关向量查询技术，达到更高的计算性能和性价比。The embodiments of the present application propose query plan optimization based on data and vector clusters to solve the problem of redundant data transmission, and solve the problem of underutilization of the GPU's stream processor in space and time by balancing the dynamic computing kernel expansion additional data clusters. It also proposes to use a pipeline scheduler to reorder and group at runtime to maximize the parallelism of transmission and calculation, so that the embodiments of the present application can surpass the existing related vector query technology and achieve higher computing performance and cost-effectiveness.

本申请实施例第二方面提供了一种基于非结构化数据集的服务器无感知向量查询系统，参照图9，图9示出了一种基于非结构化数据集的服务器无感知向量查询系统的结构示意图，如图9所示，所述系统包括：A second aspect of an embodiment of the present application provides a server-unaware vector query system based on an unstructured data set. Referring to FIG. 9 , FIG. 9 shows a schematic structural diagram of a server-unaware vector query system based on an unstructured data set. As shown in FIG. 9 , the system includes:

向量数据库，用于利用线下基于非结构数据集构建的IVF索引，查找得到与所述批量查询请求对应的多个向量簇，生成查询计划，所述查询计划表示：将所述多个向量簇从主存传输至GPU的显存中的传输顺序，和，在所述向量簇中计算所述向量查询请求的计算顺序；其中，每个所述向量簇被划分为多个平衡向量簇；A vector database, used for searching for multiple vector clusters corresponding to the batch query request by using an IVF index constructed offline based on an unstructured data set, and generating a query plan, wherein the query plan represents: a transmission order of transferring the multiple vector clusters from a main memory to a display memory of a GPU, and a calculation order of calculating the vector query request in the vector clusters; wherein each of the vector clusters is divided into multiple balanced vector clusters;

在一种可选的实施方式中，所述查询计划优化模块，包括：In an optional implementation, the query plan optimization module includes:

批量处理内部优化子模块，用于将每个所述向量簇的传输次数调整为最多传输一次；A batch processing internal optimization submodule, used for adjusting the number of transmissions of each vector cluster to at most once;

批量处理外部优化子模块，用于将已经存在于所述GPU的显存中的向量簇的计算顺序向前调整。The batch processing external optimization submodule is used to adjust the calculation order of the vector clusters that already exist in the video memory of the GPU forward.

在一种可选的实施方式中，所述装置，还包括分析器：In an optional embodiment, the device further comprises an analyzer:

所述分析器用于按照如下公式计算每个所述向量簇的所述传输时间信息：The analyzer is used to calculate the transmission time information of each vector cluster according to the following formula:

所述分析器还用于按照如下公式计算每个所述向量簇的所述计算时间信息：The analyzer is further configured to calculate the calculation time information of each vector cluster according to the following formula:

其中，所述传输时间信息表示每个所述向量簇从所述主存传输至所述GPU的显存所需的时间信息，所述计算时间信息表示在每个所述向量簇中计算对应的查询请求所需的时间信息；The transmission time information indicates the time information required for each of the vector clusters to be transmitted from the main memory to the video memory of the GPU, and the calculation time information indicates the time information required for calculating the corresponding query request in each of the vector clusters;

所述分析器还用于将计算得到的所述传输时间信息和所述计算时间信息发送至所述流水线调度器。The analyzer is further configured to send the calculated transmission time information and the calculation time information to the pipeline scheduler.

在一种可选的实施方式中，所述系统还包括：向量簇平衡模块，所述向量簇平衡模块用于将所述向量簇是通过如下步骤，划分为多个大小相近的平衡向量簇的，所述步骤包括：In an optional implementation, the system further includes: a vector cluster balancing module, the vector cluster balancing module is used to divide the vector cluster into a plurality of balanced vector clusters of similar size through the following steps, the steps including:

按照最小向量簇的大小，对每个所述向量簇进行划分，得到多个候选向量簇；Dividing each of the vector clusters according to the size of the minimum vector cluster to obtain a plurality of candidate vector clusters;

确定所述候选向量簇大小的方差是否小于预设方差值；Determining whether the variance of the candidate vector cluster size is less than a preset variance value;

在所述候选向量簇大小的方差小于所述预设方差值的情况下，将所述候选向量簇确定为所述平衡向量簇；In a case where the variance of the size of the candidate vector cluster is less than the preset variance value, determining the candidate vector cluster as the balanced vector cluster;

在所述候选向量簇大小的方差大于或等于所述预设方差值的情况下，递归将所述最小向量簇的大小减半，按照减半后的向量簇大小进行划分，生成新的候选向量簇，直至所述候选向量簇大小的方差小于所述预设方差值。When the variance of the candidate vector cluster size is greater than or equal to the preset variance value, the size of the minimum vector cluster is recursively halved, and the cluster is divided according to the halved vector cluster size to generate a new candidate vector cluster until the variance of the candidate vector cluster size is less than the preset variance value.

在一种可选的实施方式中，所述流水线调度器，包括：In an optional implementation, the pipeline scheduler includes:

第一重排序子模块，用于根据所述传输时间信息，将传输时间为零的平衡向量簇的执行顺序移动至首位；A first reordering submodule, configured to move the execution order of the balance vector cluster with a transmission time of zero to the first place according to the transmission time information;

第二重排序子模块，用于根据所述计算时间信息，将所需计算时间更大的平衡向量簇的执行顺序向前调整。The second reordering submodule is used to adjust the execution order of the balance vector cluster requiring a larger calculation time forward according to the calculation time information.

在一种可选的实施方式中，所述流水线调度器，还包括：In an optional implementation, the pipeline scheduler further includes:

节点遍历子模块，用于使用动态规划算法，遍历搜索树的节点，每个所述节点表征一种分组方案，每个节点的子节点表示所述分组方案的子方案；A node traversal submodule, used to traverse the nodes of the search tree using a dynamic programming algorithm, each of the nodes represents a grouping scheme, and the child nodes of each node represent the sub-schemes of the grouping scheme;

时间区间计算子模块，用于通过完全重叠剩余传输和计算，忽略流水线处理开销，计算得到执行当前节点的分组方案预计所需的时间区间；A time interval calculation submodule is used to calculate the estimated time interval required to execute the grouping scheme of the current node by completely overlapping the remaining transmission and calculation and ignoring the pipeline processing overhead;

剪枝子模块，用于在所述时间区间的最小值，大于其他节点的时间区间的最小值的情况下，将所述当前节点及其子节点进行剪枝；A pruning submodule, used for pruning the current node and its child nodes when the minimum value of the time interval is greater than the minimum value of the time interval of other nodes;

确定子模块，用于将剪枝后所述搜索树中最终剩余的节点所对应的分组方案，确定为所述分组计划。The determination submodule is used to determine the grouping scheme corresponding to the nodes finally remaining in the search tree after pruning as the grouping plan.

在一种可选的实施方式中，所述声明式应用编程接口，包括：In an optional implementation, the declarative application programming interface includes:

查询请求信息接收子模块，用于接收用户输入的查询请求信息，所述查询请求信息至少包括：批量查询请求的向量、搜索的精度要求、预期查询处理的时间、返回最邻近向量的个数；A query request information receiving submodule, configured to receive query request information input by a user, wherein the query request information includes at least: vectors of a batch query request, search accuracy requirements, expected query processing time, and the number of nearest neighbor vectors returned;

转化子模块，用于将所述查询请求信息转化为搜索配置和资源配置；A conversion submodule, used to convert the query request information into search configuration and resource configuration;

请求生成子模块，用于执行所述搜索配置和所述资源配置，生成所述批量查询请求。The request generation submodule is used to execute the search configuration and the resource configuration to generate the batch query request.

在一种可选的实施方式中，所述GPU处理器，包括：In an optional implementation, the GPU processor includes:

存储管理器，用于从所述全局分组队列中提取传输任务，维护本地分组传输队列；A storage manager, configured to extract transmission tasks from the global packet queue and maintain a local packet transmission queue;

所述存储管理器，还用于根据所述本地分组传输队列，在执行完成当前分组的传输任务后，通知内核控制器执行对应的计算任务，并开始执行下一分组的传输任务；The storage manager is further used to notify the kernel controller to execute the corresponding computing task and start to execute the transmission task of the next packet according to the local packet transmission queue after completing the transmission task of the current packet;

内核控制器，用于从所述全局分组队列中提取计算任务，维护本地分组计算队列；A kernel controller, used for extracting computing tasks from the global packet queue and maintaining a local packet computing queue;

所述内核控制器，还用于根据本地分组计算队列，执行当前分组的计算任务，在执行完成所有分组后，生成向量查询结果。The kernel controller is further used to execute the calculation task of the current group according to the local group calculation queue, and generate the vector query result after completing the execution of all groups.

在一种可选的实施方式中，所述内核控制器，包括动态内核拓展模块和多个流处理器；In an optional implementation, the kernel controller includes a dynamic kernel expansion module and a plurality of stream processors;

所述动态内核拓展模块，用于根据平衡向量簇地址的特征，将当前分组中的每个所述平衡向量簇划分为多个子向量簇；The dynamic kernel expansion module is used to divide each of the balance vector clusters in the current group into a plurality of sub-vector clusters according to the characteristics of the balance vector cluster address;

所述动态内核拓展模块，还用于将所述向量查询请求在每个所述子向量簇上的计算作为一个块计算任务，将所述块计算任务平均分配给所述多个流处理器；The dynamic kernel expansion module is further used to treat the calculation of the vector query request on each of the sub-vector clusters as a block calculation task, and evenly distribute the block calculation tasks to the multiple stream processors;

所述流处理器，用于执行所述动态内核拓展模块分配的所述块计算任务。The stream processor is used to execute the block computing task assigned by the dynamic kernel extension module.

本发明实施例还提供了一种电子设备，参照图10，图10是本发明实施例提出的电子设备的结构示意图。如图10所示，电子设备100包括：存储器110和处理器120，存储器110与处理器120之间通过总线通信连接，存储器110中存储有计算机程序，该计算机程序可在处理器120上运行，进而实现本发明实施例公开的一种基于非结构化数据集的服务器无感知向量查询方法中的步骤。The embodiment of the present invention further provides an electronic device, referring to FIG10, which is a schematic diagram of the structure of the electronic device proposed in the embodiment of the present invention. As shown in FIG10, the electronic device 100 includes: a memory 110 and a processor 120, the memory 110 and the processor 120 are connected via a bus communication, the memory 110 stores a computer program, and the computer program can be run on the processor 120, thereby implementing the steps in a server-unaware vector query method based on an unstructured data set disclosed in the embodiment of the present invention.

本发明实施例还提供了一种计算机可读存储介质，其上存储有计算机程序/指令，该计算机程序/指令被处理器执行时实现本发明实施例公开的一种基于非结构化数据集的服务器无感知向量查询方法中的步骤。An embodiment of the present invention further provides a computer-readable storage medium having a computer program/instruction stored thereon. When the computer program/instruction is executed by a processor, the steps of a server-unaware vector query method based on an unstructured data set disclosed in an embodiment of the present invention are implemented.

本发明实施例还提供了一种计算机程序产品，包括计算机程序/指令，该计算机程序/指令被处理器执行时实现本发明实施例公开的一种基于非结构化数据集的服务器无感知向量查询方法中的步骤。An embodiment of the present invention further provides a computer program product, including a computer program/instruction, which, when executed by a processor, implements the steps in a server-unaware vector query method based on an unstructured data set disclosed in an embodiment of the present invention.

本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the various embodiments can be referenced to each other.

本发明实施例是参照根据本发明实施例的方法、装置、电子设备和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。The embodiments of the present invention are described with reference to the flowcharts and/or block diagrams of the methods, devices, electronic devices, and computer program products according to the embodiments of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing terminal device to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing terminal device generate a device for implementing the functions specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

尽管已描述了本发明实施例的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明实施例范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art may make additional changes and modifications to these embodiments once they have learned the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the embodiments of the present invention.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or terminal device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or terminal device. In the absence of further restrictions, the elements defined by the sentence "including one..." do not exclude the existence of other identical elements in the process, method, article or terminal device including the elements.

以上对本发明所提供的一种基于非结构化数据集的服务器无感知向量查询方法和系统，进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。The above is a detailed introduction to a server-aware vector query method and system based on an unstructured data set provided by the present invention. This article uses specific examples to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only used to help understand the method of the present invention and its core idea; at the same time, for general technical personnel in this field, according to the idea of the present invention, there will be changes in the specific implementation method and application scope. In summary, the content of this specification should not be understood as a limitation on the present invention.

Claims

1. A server-unaware vector query method based on an unstructured data set, characterized in that the method comprises:

Obtaining a batch query request by using a server-unaware declarative application programming interface, where the batch query request includes a plurality of vector query requests;

Using an IVF index constructed offline based on an unstructured data set, multiple vector clusters corresponding to the batch query request are searched and obtained, and a query plan is generated, wherein the query plan represents: a transmission order of the multiple vector clusters from the main memory to the video memory of the GPU, and a calculation order of the vector query request in the vector cluster; wherein each of the vector clusters is divided into multiple balanced vector clusters; the balanced vector cluster represents a vector cluster whose variance value obtained by dividing the vector cluster is less than a preset variance value;

Optimizing the query plan, eliminating redundant transmissions in the query plan, and obtaining an optimized query plan;

Acquire transmission time information and calculation time information, and reorder the optimized query plan with the balance vector cluster as the granularity to obtain the optimal execution order;

Using a dynamic programming algorithm, grouping the optimal execution sequence to obtain a grouping plan;

According to the grouping plan, each group is pushed into the global grouping queue for transmission and calculation to obtain a vector query result.

2. The server-unaware vector query method based on unstructured data sets according to claim 1, characterized in that said optimizing the query plan and eliminating redundant transmission in the query plan comprises:

Adjusting the number of transmissions of each of the vector clusters to at most once;

The calculation order of the vector clusters already existing in the video memory of the GPU is adjusted forward.

3. The server-unaware vector query method based on unstructured data sets according to claim 1, wherein the step of obtaining transmission time information and calculation time information comprises:

The transmission time information of each vector cluster is calculated according to the following formula:

Transfer time = ,in, is the size of the equilibrium vector cluster within the vector cluster, a and b are transmission time parameters pre-fitted by the least square method, and m represents the number of the equilibrium vector clusters within the vector cluster;

The calculation time information of each vector cluster is calculated according to the following formula:

Compute time = , where A and B are the calculation time parameters obtained in advance by least squares fitting;

The transmission time information represents the time information required for each of the vector clusters to be transmitted from the main memory to the video memory of the GPU, and the calculation time information represents the time information required for calculating the corresponding query request in each of the vector clusters.

4. The server-unaware vector query method based on unstructured data sets according to claim 1, characterized in that the vector cluster is divided into a plurality of balanced vector clusters by the following steps, the steps comprising:

Dividing each of the vector clusters according to the size of the minimum vector cluster to obtain a plurality of candidate vector clusters;

Determining whether the variance of the candidate vector cluster size is less than the preset variance value;

In a case where the variance of the size of the candidate vector cluster is less than the preset variance value, determining the candidate vector cluster as the balanced vector cluster;

When the variance of the candidate vector cluster size is greater than or equal to the preset variance value, the size of the minimum vector cluster is recursively halved, and the cluster is divided according to the halved vector cluster size to generate a new candidate vector cluster until the variance of the candidate vector cluster size is less than the preset variance value.

5. The server-unaware vector query method based on unstructured data sets according to claim 1, characterized in that the obtaining of transmission time information and calculation time information, taking the balance vector cluster as the granularity, and reordering the optimized query plan comprises:

According to the transmission time information, the execution order of the balance vector cluster with zero transmission time is moved to the first place;

According to the calculation time information, the execution order of the balance vector cluster requiring a larger calculation time is adjusted forward.

6. The server-unaware vector query method based on unstructured data sets according to claim 1, characterized in that the use of a dynamic programming algorithm to group the optimal execution order to obtain a grouping plan comprises:

Using a dynamic programming algorithm, traverse the nodes of the search tree, each of the nodes represents a grouping scheme, and the child nodes of each node represent the sub-schemes of the grouping scheme;

By completely overlapping the remaining transmission and calculation, ignoring the pipeline processing overhead, the estimated time interval required to execute the grouping scheme of the current node is calculated;

When the minimum value of the time interval is greater than the minimum value of the time interval of other nodes, pruning the current node and its child nodes;

The grouping scheme corresponding to the nodes finally remaining in the search tree after pruning is determined as the grouping plan.

7. The server-unaware vector query method based on unstructured data sets according to claim 1, wherein the step of obtaining batch query requests by using a server-unaware declarative application programming interface comprises:

Receive query request information input by a user, wherein the query request information includes at least: vectors of a batch query request, search accuracy requirements, expected query processing time, and the number of nearest neighbor vectors returned;

The query request information is converted into search configuration and resource configuration through the server-unaware declarative application programming interface;

The search configuration and the resource configuration are executed to generate the batch query request.

8. The server-unaware vector query method based on unstructured data sets according to claim 1, characterized in that the step of pushing each group into a global group queue for transmission and calculation according to the grouping plan to obtain a vector query result comprises:

Extracting transmission tasks from the global packet queue through a storage manager and maintaining a local packet transmission queue;

According to the local packet transmission queue, after completing the transmission task of the current packet, the storage manager notifies the kernel controller to execute the corresponding computing task and starts to execute the transmission task of the next packet;

Extracting computing tasks from the global packet queue through the kernel controller and maintaining a local packet computing queue;

According to the local group calculation queue, the kernel controller executes the calculation task of the current group, and after completing the execution of all groups, the kernel controller generates a vector query result.

9. The server-unaware vector query method based on unstructured data sets according to claim 8, wherein the kernel controller executes the calculation task of the current group, including:

According to the characteristics of the balance vector cluster address, each of the balance vector clusters in the current group is divided into a plurality of sub-vector clusters;

The calculation of the vector query request on each of the sub-vector clusters is regarded as a block calculation task, and the block calculation tasks are evenly distributed to multiple stream processors on the GPU;

Each of the stream processors executes the assigned block computing task.

10. A server-unaware vector query system based on unstructured data sets, characterized in that the system comprises:

A declarative application programming interface, used to obtain a batch query request using a server-unaware declarative application programming interface, wherein the batch query request includes multiple vector query requests;

A vector database, used to use an IVF index built offline based on an unstructured data set to search for multiple vector clusters corresponding to the batch query request and generate a query plan, wherein the query plan represents: a transmission order of the multiple vector clusters from the main memory to the display memory of the GPU, and a calculation order of the vector query request in the vector cluster; wherein each of the vector clusters is divided into multiple balanced vector clusters; the balanced vector cluster represents a vector cluster whose variance value obtained by dividing the vector cluster is less than a preset variance value;

A query plan optimization module, used to optimize the query plan, eliminate redundant transmissions in the query plan, and obtain an optimized query plan;

A pipeline scheduler, used for obtaining transmission time information and calculation time information, and reordering the optimized query plan with the balance vector cluster as the granularity to obtain the optimal execution order;

The pipeline scheduler is further used to group the optimal execution order using a dynamic programming algorithm to obtain a grouping plan;

The GPU processor is used to push each group into the global grouping queue for transmission and calculation according to the grouping plan to obtain the vector query result.