FR2974206A1

FR2974206A1 - GPU SITE

Info

Publication number: FR2974206A1
Application number: FR1101188A
Authority: FR
Inventors: Emmanuel Bardiere; Fabrice Ferrand
Original assignee: Sagem Defense Securite SA
Current assignee: Safran Electronics and Defense SAS
Priority date: 2011-04-15
Filing date: 2011-04-15
Publication date: 2012-10-19
Anticipated expiration: 2031-04-15
Also published as: FR2974206B1; EP2697723A1; WO2012140024A1

Abstract

L'invention porte sur l'utilisation de procédés de stockage et d'accès aux données entièrement gérés dans l'unité de traitement graphique d'un ordinateur. Les données y sont stockées au sein d'un arbre kd adapté à l'architecture de l'unité de traitement dans la mémoire vive de l'unité de traitement graphique. Les algorithmes d'agencement des données et d'accès à ces données sont adaptés à une exécution parallèle par cette unité de traitement graphique. De cette façon, on utilise avantageusement la puissance de calcul de l'unité graphique pour accélérer la gestion de la base de données.The invention relates to the use of fully managed data storage and access methods in the graphics processing unit of a computer. The data is stored in a kd tree adapted to the architecture of the processing unit in the RAM of the graphics processing unit. The data layout and access algorithms for this data are adapted to parallel execution by this graphics processing unit. In this way, the computing power of the graphic unit is advantageously used to speed up the management of the database.

Description

La présente invention concerne le domaine des bases de données et plus particulièrement de procédé de stockage et d'accès aux données sur une machine disposant d'un processeur graphique performant. Nous nous intéressons dans ce document à l'architecture des SIG (Système d'Information Géoraphique) modernes composés d'importantes bases de données vectorielles. Nous verrons que les solutions décrites dépassent ce cadre et sont applicables à toute base de données indépendamment du type des données manipulées. Les SIG modernes sont composés d'une base de données contenant un grand nombre, typiquement en millions, d'objets élémentaires. Ces objets élémentaires sont typiquement : d'une part, des primitives vectorielles de base telles que points, vecteurs et polygones, de nature géométrique et décrivant la localisation de la primitive, et d'autre part, des attributs sémantiques de natures diverses, par exemple type de route ou autre. Les données peuvent donc être vues comme une collection de vecteurs à N dimensions où N représente le nombre d'attributs de chaque primitive. The present invention relates to the field of databases and more particularly to the method of storage and access to data on a machine with a powerful graphics processor. We are interested in this document in the architecture of modern GIS (Georaphic Information System) composed of large vector databases. We will see that the described solutions go beyond this framework and are applicable to any database regardless of the type of data handled. Modern GIS are composed of a database containing a large number, typically in millions, of elementary objects. These basic objects are typically: on the one hand, basic vector primitives such as points, vectors and polygons, of a geometric nature and describing the location of the primitive, and on the other hand, semantic attributes of various natures, for example type of road or other. The data can therefore be seen as a collection of N-dimensional vectors where N represents the number of attributes of each primitive.

Lorsque l'on utilise le système, on navigue dans une vue géographique ou scène. Cette scène est typiquement une zone géographique éventuellement associée à des critères de sélection des éléments à afficher en fonction de critères propres à l'application courante. L'affichage nécessite donc une requête dans la base de données pour récupérer les primitives à afficher en fonction de la vue désirée, éventuellement une étape d'habillage de ces primitives consistant à fixer certains paramètres qui peuvent dépendre de l'application comme la couleur d'affichage, puis le traçage de ces primitives. Ces étapes font intervenir un module d'accès aux données qui s'exécute typiquement sur le processeur de l'ordinateur utilisé. Ce module accède aux données qui sont stockées sur un moyen de stockage accessible tel qu'un disque dur, un disque en mémoire flash ou SSD (Solid State Drive en anglais). Pour accélérer les traitements, il est connu de charger l'intégralité des données en mémoire vive ou RAM (Random Access Memory en anglais) du processeur principal ou CPU (Central Processing Unit en anglais) de l'ordinateur. Même ainsi, la quantité de données typiquement en Gigaoctets, la résolution des écrans, typiquement plusieurs milliers de points dans chaque dimension, conduisent à un traitement de plusieurs minutes sur les processeurs les plus puissants du moment. Ces données nécessaires à l'affichage d'une vue, une fois acquises, sont transférées à un processeur graphique ou GPU (Graphical Processing Unit en anglais) sous la forme d'un flux de primitives pour être tracées à l'écran. When using the system, we navigate in a geographical view or scene. This scene is typically a geographical zone possibly associated with criteria for selecting the elements to be displayed according to criteria specific to the current application. The display therefore requires a query in the database to retrieve the primitives to be displayed according to the desired view, possibly a step of dressing these primitives by setting certain parameters that may depend on the application, such as the color of the display. display and trace these primitives. These steps involve a data access module that typically runs on the processor of the computer being used. This module accesses the data that is stored on an accessible storage means such as a hard disk, a disk in flash memory or SSD (Solid State Drive). To speed up the processing, it is known to load all the data in random access memory or RAM (Random Access Memory) of the main processor or CPU (Central Processing Unit in English) of the computer. Even so, the amount of data typically in gigabytes, the resolution of screens, typically several thousand points in each dimension, lead to a treatment of several minutes on the most powerful processors of the moment. These data necessary for the display of a view, once acquired, are transferred to a graphics processor or GPU (Graphical Processing Unit in English) in the form of a stream of primitives to be plotted on the screen.

L'invention vise à résoudre les problèmes précédents par l'utilisation de procédés de stockage et d'accès aux données entièrement gérés dans l'unité de traitement graphique de l'ordinateur. Les données y sont stockées au sein d'un arbre kd adapté à l'architecture de l'unité de traitement dans la mémoire vive de l'unité de traitement graphique. Les algorithmes d'agencement des données et d'accès à ces données sont adaptés à une exécution parallèle par cette unité de traitement graphique. De cette façon, on utilise avantageusement la puissance de calcul de l'unité graphique pour accélérer la gestion de la base de données. La présente invention concerne un procédé de stockage d'une collection de 10 points d'un espace multidimensionnel en mémoire d'un dispositif de traitement de l'information, caractérisé en ce qu'il comporte : - une étape préliminaire de création d'un vecteur de rang par dimension et d'un vecteur d'index, lesdits points de la collection étant stockés dans un tableau, ledit vecteur d'index contenant l'index dans le tableau de chaque point, chaque vecteur de 15 rang contenant l'index du point correspondant dans un tableau où les points seraient triés selon la dimension correspondante ; - une étape de construction d'un arbre kd représentant ladite collection de points, les points étant répartis à chaque noeud en valeurs inférieures et supérieures à la médiane des valeurs selon une des dimensions, la répartition étant faite sur les 20 vecteurs de rang et le vecteur d'index, les points affectés à chaque fil n'étant pas triés entre eux, le tableau stockant la collection de points n'étant pas modifié. Selon un mode de réalisation, l'étape de construction de l'arbre kd comporte : - une étape de répartition des valeurs d'un vecteur de rang en valeurs inférieures et en valeurs supérieures à la médiane ; 25 - une étape de répercussion de la dite répartition sur les autres vecteurs de rang et sur le vecteur d'index ; - une étape de renumérotation des valeurs réparties au sein de chaque semivecteur au sein d'un autre vecteur de rang correspondant à la dimension selon laquelle doit être effectuée l'étape suivante de répartition. 30 Selon un mode de réalisation, l'étape de répartition comporte : - une étape (4.1) de calcul d'un second vecteur de drapeaux binaires de la même longueur que le vecteur à répartir dont chaque valeur est le résultat de la comparaison de la valeur correspondante du tableau avec la valeur de l'index de la médiane ; - une étape (4.2) de calcul d'un second vecteur de drapeaux binaires de la même longueur que le vecteur à répartir dont chaque valeur est l'inverse de la valeur correspondante du premier vecteur de drapeaux ; - une étape (4.3) de calcul des sommes préfixes de ces deux vecteurs de drapeaux; - une étape (4.4) d'ajout de la valeur de l'index de la médiane à toutes les valeurs de la somme préfixe du second vecteur de drapeaux ; - une étape (4.5) de réordonnancement des index inférieurs à la médiane à l'index correspondant à la valeur de la première somme préfixe ; - une étape (4.6) de réordonnancement des index supérieurs à la médiane à l'index correspondant à la valeur de la seconde somme préfixe. Selon un mode de réalisation, l'étape de renumérotation comporte, le vecteur à renuméroter (6.2) étant de longueur N/2 et contenant des valeurs comprises entre 0 et N--1 - une étape de construction d'un vecteur de drapeaux (6.3) de longueur N contenant pour chaque indice une valeur booléenne 1 si l'indice est présent dans le tableau et 0 sinon ; - une étape de construction de la somme préfixe (6.4) dudit vecteur de drapeaux ; - une étape où chaque valeur du tableau initial (6.2) est remplacée par la valeur du tableau des sommes préfixes (6.4) dont l'indice correspond à ladite valeur du tableau initial. Selon un mode de réalisation, ledit dispositif de traitement de l'information comportant une pluralité de processeurs scalaires disposant d'une mémoire locale, l'étape de construction de l'arbre kd est interrompue dès que la collection de points associée à une feuille de l'arbre a une taille mémoire inférieure à la taille de cette mémoire locale. Selon un mode de réalisation, chaque noeud de l'arbre mémorise deux points représentant un hypercube englobant la collection de points associée à ce noeud. The invention aims to solve the above problems by using fully managed data storage and access methods in the graphics processing unit of the computer. The data is stored in a kd tree adapted to the architecture of the processing unit in the RAM of the graphics processing unit. The data layout and access algorithms for this data are adapted to parallel execution by this graphics processing unit. In this way, the computing power of the graphic unit is advantageously used to speed up the management of the database. The present invention relates to a method for storing a collection of 10 points of a multidimensional space in memory of an information processing device, characterized in that it comprises: a preliminary step of creating a a rank-by-dimension vector and an index vector, said collection points being stored in an array, said index vector containing the index in the array of each dot, each row vector containing the index the corresponding point in a table where the points are sorted according to the corresponding dimension; a step of construction of a tree kd representing said collection of points, the points being distributed at each node in values lower and greater than the median of the values according to one of the dimensions, the distribution being made on the rank vectors and the index vector, the points assigned to each thread not being sorted together, the table storing the collection of points is not changed. According to one embodiment, the step of constructing the kd tree comprises: a step of distributing the values of a rank vector in lower values and in values greater than the median; A step of repercussion of said distribution on the other rank vectors and on the index vector; a step of renumbering the values distributed within each semivector within another vector of rank corresponding to the dimension according to which the next step of distribution is to be performed. According to one embodiment, the distribution step comprises: a step (4.1) of calculating a second vector of binary flags of the same length as the vector to be distributed, each value of which is the result of the comparison of the corresponding value of the table with the value of the index of the median; a step (4.2) of calculating a second vector of binary flags of the same length as the vector to be distributed, each value of which is the inverse of the corresponding value of the first vector of flags; a step (4.3) of calculating the prefix sums of these two flag vectors; a step (4.4) of adding the value of the index of the median to all the values of the prefix sum of the second vector of flags; a step (4.5) of reordering the indexes below the median to the index corresponding to the value of the first prefix sum; a step (4.6) of reordering the indexes greater than the median to the index corresponding to the value of the second prefix sum. According to one embodiment, the step of renumbering comprises, the vector to be renumbered (6.2) being of length N / 2 and containing values between 0 and N - 1 - a step of constructing a vector of flags ( 6.3) of length N containing for each index a Boolean value 1 if the index is present in the array and 0 otherwise; a step of constructing the prefix sum (6.4) of said flag vector; a step where each value of the initial array (6.2) is replaced by the value of the array of prefix sums (6.4) whose index corresponds to said value of the initial array. According to one embodiment, said information processing device comprising a plurality of scalar processors having a local memory, the step of constructing the tree kd is interrupted as soon as the collection of points associated with a sheet of the tree has a memory size smaller than the size of this local memory. According to one embodiment, each node of the tree stores two points representing a hypercube encompassing the collection of points associated with this node.

La présente invention concerne également un procédé de recherche, au sein d'une collection de points d'un espace multidimensionnel, des points appartenant à un hypercube dudit espace, la collection de points étant mémorisée sous la forme d'un arbre kd, caractérisé en ce qu'il comporte : - une étape (7.1) d'initialisation d'un vecteur de longueur égale au nombre de feuilles dans l'arbre kd à zéro, la première valeur étant mise à un, ce vecteur représentant pour une profondeur donnée de l'arbre aux indices des noeuds devant être recherchés; - une étape (7.2) de test pour chaque noeud dont l'indice est à 1 pour la profondeur courante en commençant par la racine, on met à 1 les indices des fils du noeud testé, l'intersection de l'hypercube englobant les points associés à ce noeud possédant une intersection non nulle avec l'hypercube recherché ; - une étape (7.3) où l'on teste s'il reste des noeuds à rechercher ; si oui, on 10 reboucle à l'étape de test pour la profondeur suivante ; - une étape (7.4) où l'on teste les points associés aux feuilles sélectionnées pour obtenir le résultat. La présente invention concerne, de plus, un système de gestion de base de données comportant une unité de traitement graphique et des moyens de stockage 15 d'une collection de points d'un espace multidimensionnel en mémoire d'un dispositif de traitement de l'information, caractérisé en ce qu'il comporte : - des moyens pour créer un vecteur de rang par dimension et un vecteur d'index, lesdits points de la collection étant stockés dans un tableau, ledit vecteur d'index contenant l'index dans le tableau de chaque point, chaque vecteur de rang contenant 20 l'index du point correspondant dans un tableau où les points seraient triés selon la dimension correspondante ; - des moyens pour construire un arbre kd représentant ladite collection de points, les points étant répartis à chaque noeud en valeurs inférieures et supérieures à la médiane des valeurs selon une des dimensions, la répartition étant faite sur les 25 vecteurs de rang et le vecteur d'index, les points affectés à chaque fil n'étant pas triés entre eux, le tableau stockant la collection de points n'étant pas modifié. La présente invention concerne aussi un système de gestion de base de données comportant une unité de traitement graphique et des moyens de recherche, au sein d'une collection de points d'un espace multidimensionnel, des points appartenant à un 30 hypercube dudit espace, la collection de points étant mémorisée sous la forme d'un arbre kd, caractérisé en ce qu'il comporte : - des moyens pour initialiser un vecteur de longueur égale au nombre de feuilles dans l'arbre kd à zéro, la première valeur étant mise à un, ce vecteur représentant pour une profondeur donnée de l'arbre aux indices des noeuds devant être recherchés ; - des moyens pour tester pour chaque noeud dont l'indice est à 1 pour la profondeur courante en commençant par la racine, on met à 1 les indices des fils du noeud testé, l'intersection de l'hypercube englobant les points associés à ce noeud possédant une intersection non nulle avec l'hypercube recherché ; - des moyens pour tester s'il reste des noeuds à rechercher et si oui pour reboucler à l'étape de test pour la profondeur suivante ; - des moyens pour tester les points associés aux feuilles sélectionnées pour obtenir le résultat. Les caractéristiques de l'invention mentionnées ci-dessus, ainsi que d'autres, apparaîtront plus clairement à la lecture de la description suivante d'un exemple de réalisation, ladite description étant faite en relation avec les dessins joints, parmi lesquels : La Fig. 1 illustre un exemple d'architecture d'une unité de traitement graphique. La Fig. 2 illustre un arbre kd. The present invention also relates to a method for searching, within a collection of points of a multidimensional space, points belonging to a hypercube of said space, the collection of points being stored in the form of a kd tree, characterized in that it comprises: - a step (7.1) of initialization of a vector of length equal to the number of leaves in the tree kd to zero, the first value being set to one, this vector representing for a given depth of the tree with indices of nodes to be searched; a test step (7.2) for each node whose index is at 1 for the current depth starting with the root, the indices of the threads of the tested node are set to 1, the intersection of the hypercube encompassing the points associated with this node having a non-zero intersection with the sought hypercube; a step (7.3) where it is tested whether there are still nodes to search; if so, we roll back to the test step for the next depth; a step (7.4) in which the points associated with the selected sheets are tested to obtain the result. The present invention further relates to a database management system comprising a graphics processing unit and means for storing a collection of points of a multidimensional space in memory of a processing device of the information, characterized in that it comprises: - means for creating a rank-by-dimension vector and an index vector, said collection points being stored in an array, said index vector containing the index in the table of each point, each rank vector containing the index of the corresponding point in a table where the points would be sorted according to the corresponding dimension; means for constructing a tree kd representing said collection of points, the points being distributed at each node in values lower and greater than the median of the values according to one of the dimensions, the distribution being made on the rank vectors and the vector of index, the points assigned to each wire not being sorted together, the table storing the collection of points is not changed. The present invention also relates to a database management system comprising a graphics processing unit and means for searching, within a collection of points of a multidimensional space, points belonging to a hypercube of said space, the collection of points being stored in the form of a tree kd, characterized in that it comprises: - means for initializing a vector of length equal to the number of leaves in the tree kd to zero, the first value being set to a, this vector representing for a given depth of the tree to the indices of the nodes to be searched; means for testing for each node whose index is 1 for the current depth, starting with the root, setting the indices of the son of the node under test, the intersection of the hypercube encompassing the points associated with this node; node having a non-zero intersection with the desired hypercube; means for testing whether there are still nodes to search and if so to loop back to the test step for the next depth; means for testing the points associated with the selected sheets to obtain the result. The characteristics of the invention mentioned above, as well as others, will emerge more clearly on reading the following description of an exemplary embodiment, said description being given in relation to the attached drawings, among which: FIG. . 1 illustrates an exemplary architecture of a graphics processing unit. Fig. 2 illustrates a kd tree.

La Fig. 3 illustre les structures de données utilisées dans un exemple de réalisation de l'invention. La Fig. 4 illustre l'algorithme parallèle de répartition d'un vecteur de rang selon un exemple de réalisation de l'invention. La Fig. 5 illustre un exemple de répartition d'un vecteur de rang. Fig. 3 illustrates the data structures used in an exemplary embodiment of the invention. Fig. 4 illustrates the parallel algorithm of distribution of a rank vector according to an exemplary embodiment of the invention. Fig. 5 illustrates an example of distribution of a rank vector.

La Fig. 6 illustre un exemple de renumérotation des semi-vecteurs. La Fig. 7 illustre l'algorithme de sélection au sein de la structure selon un exemple de réalisation de l'invention. Les unités de traitement graphiques actuelles sont des multiprocesseurs massivement parallèles. Ces unités de traitement cumulent plusieurs modèles de parallélisme. Un exemple d'une telle architecture est illustré Fig. 1. L'unité de traitement graphique 1.1 contient une mémoire globale 1.2 accessible à l'ensemble des processeurs. Elle est constituée d'une pluralité de multiprocesseurs 1.3, 1.4, 1.5 et 1.6 chacun de ces multi processeurs contenant lui-même une pluralité de processeurs scalaires 1.7, 1.8 et 1.9. Ces processeurs scalaires fonctionnent selon un mode SPMD (Single Program Multiple Data en anglais), c'est-à-dire qu'ils exécutent le même programme sous le contrôle d'une unité de gestion des instructions 1.10 sur des données propres à chaque processeur. Il existe plusieurs espaces de mémoires au sein de l'unité de traitement graphique. On a déjà cité la mémoire globale 1.2. L'accès à cette mémoire globale est exclusif et il est donc nécessaire de le limiter. Il existe également deux caches permettant d'accélérer l'accès à certaines données. Un premier cache 1.12 est dédié au stockage de constantes. Un second cache 1.13 est dédié au stockage des textures. Ces caches sont prévus pour optimiser une utilisation en rendu d'une scène 3D pour des applications telles que les jeux. Ils peuvent être utilisés à d'autres fins dans les autres applications. Chaque multiprocesseur dispose également d'une mémoire locale 1.11 partagée entre les processeurs scalaires du multiprocesseur. L'accès à cette mémoire locale est beaucoup plus rapide que l'accès à la mémoire globale et surtout indépendant de l'activité des autres multiprocesseurs. Chaque processeur scalaire dispose également d'un espace de registres propres non représentés. Le modèle de programmation de ces unités tire parti de leur nature massivement parallèle. Ce modèle est un modèle SIMT (Single Instruction Multiple Thread en anglais). Dans ce modèle, il s'agit d'exécuter en parallèle un même programme sur les différents multiprocesseurs, chaque multiprocesseur pouvant exécuter en parallèle des groupes de processus légers (connus sous le nom de Warps of Threads en anglais), chaque groupe (warp) étant exécuté sur un processeur scalaire. Dans l'exemple de réalisation, on utilise une carte graphique du constructeur NVidia qui peut exécuter ainsi 128 groupes de 1024 processus légers (thread). Des langages de programmation dédiés à la programmation de telles architectures existent. On peut citer CUDA (Compute Unified Device Architecture en anglais) du constructeur NVidia ou encore OpenCL (Open Computing Language en anglais) proposé à l'origine par Apple et actuellement maintenu par le groupe Khronos. C'est ce dernier qui est utilisé dans l'exemple de réalisation. L'invention repose sur l'idée de gérer la base de données entièrement sur une telle unité de traitement graphique. Une telle base de données est typiquement gérée sous la forme d'un arbre de données. Cet agencement des données permet des requêtes rapides en particulier sur les attributs géométriques. Lorsque l'on cherche à implémenter un tel agencement sur une architecture d'unité de traitement graphique, on doit résoudre plusieurs problèmes. Il faut trouver une méthode rapide de construction de l'arbre dans la mémoire globale de l'unité. Il faut aussi adapter l'agencement pour que les requêtes puissent se faire de manière rapide. Les algorithmes classiques de gestion des arbres sont fortement récursifs. Une des particularités des unités de traitement graphique est qu'elles ne disposent pas de pile et sont donc impropres à l'exécution de programmes récursifs. De plus, quand bien même elles seraient dotées d'une telle pile, ces algorithmes récursifs ne présentent pas de parallélisme directement exploitable. On a choisi un agencement des données sous la forme d'un arbre kd (kd-tree en anglais). On rappelle qu'il s'agit de stocker une collection de points constitués par des vecteurs de dimension N dans la mémoire globale de l'unité de traitement graphique. Dans un arbre kd (un exemple est illustré Fig. 2), chaque noeud possède exactement deux fils sauf les feuilles. Les données vont être stockées dans un tel arbre selon le procédé que l'on va décrire maintenant. Une première étape avantageuse permet d'obtenir l'équilibrage de l'arbre. Cette première étape initiale consiste à calculer le nombre de niveaux de l'arbre et la dimension des feuilles en fonction de la taille de la mémoire locale disponible. On en déduit la taille totale de l'arbre, ce qui permet de réaliser le compactage des données et leur allocation statique. En effet, les unités de traitement graphique ne permettent généralement pas l'allocation dynamique de mémoire. L'arbre final est fortement équilibré, car le remplissage des feuilles et la répartition des niveaux sont maximaux, ce qui permet d'augmenter le degré de parallélisme et donc la performance finale. En effet, il est important pour une bonne exploitation du parallélisme que les données soient réparties de manière homogène dans l'arbre. A chaque noeud correspond une collection de points que l'on doit répartir entre le fil gauche et le fil droit. A la racine correspond l'intégralité de la collection. Fig. 6 illustrates an example of renumbering semi-vectors. Fig. 7 illustrates the selection algorithm within the structure according to an exemplary embodiment of the invention. Current graphics processing units are massively parallel multiprocessors. These processing units combine several models of parallelism. An example of such an architecture is illustrated in FIG. 1. The graphics processing unit 1.1 contains a global memory 1.2 accessible to all the processors. It consists of a plurality of multiprocessors 1.3, 1.4, 1.5 and 1.6 each of these multi-processors itself containing a plurality of scalar processors 1.7, 1.8 and 1.9. These scalar processors operate in a SPMD (Single Program Multiple Data) mode, that is, they execute the same program under the control of an instruction management unit 1.10 on data specific to each processor. . There are several memory spaces within the graphics processing unit. We have already mentioned the global memory 1.2. Access to this global memory is exclusive and it is therefore necessary to limit it. There are also two caches to speed up access to certain data. A first cache 1.12 is dedicated to the storage of constants. A second cache 1.13 is dedicated to storing textures. These caches are intended to optimize use in rendering a 3D scene for applications such as games. They can be used for other purposes in other applications. Each multiprocessor also has a local memory 1.11 shared between the scalar processors of the multiprocessor. Access to this local memory is much faster than access to the global memory and especially independent of the activity of other multiprocessors. Each scalar processor also has a space of unrepresented registers. The programming model of these units takes advantage of their massively parallel nature. This model is a SIMT model (Single Instruction Multiple Thread). In this model, it is to run in parallel the same program on the various multiprocessors, each multiprocessor can run in parallel groups of light processes (known as Warps of Threads in English), each group (warp) being executed on a scalar processor. In the exemplary embodiment, a graphic card of the NVidia constructor is used which can thus execute 128 groups of 1024 light processes (threads). Programming languages dedicated to the programming of such architectures exist. One can quote CUDA (Compute Unified Device Architecture in English) of the manufacturer NVidia or OpenCL (Open Computing Language in English) proposed at the origin by Apple and currently maintained by the group Khronos. It is the latter which is used in the exemplary embodiment. The invention is based on the idea of managing the database entirely on such a graphic processing unit. Such a database is typically managed as a data tree. This arrangement of data allows fast queries especially on the geometric attributes. When attempting to implement such an arrangement on a graphics processing unit architecture, several problems must be solved. A quick method of building the tree in the global memory of the unit must be found. It is also necessary to adapt the arrangement so that queries can be done quickly. Traditional tree management algorithms are highly recursive. One of the peculiarities of graphic processing units is that they do not have a stack and are therefore unsuitable for executing recursive programs. Moreover, even if they would be equipped with such a stack, these recursive algorithms do not have any directly exploitable parallelism. We chose a data arrangement in the form of a tree kd (kd-tree in English). It is recalled that it is a question of storing a collection of points constituted by vectors of dimension N in the global memory of the graphics processing unit. In a kd tree (an example is shown in Fig. 2), each node has exactly two sons except the leaves. The data will be stored in such a tree according to the method that will be described now. An advantageous first step makes it possible to obtain balancing of the shaft. This first initial step consists in calculating the number of levels of the tree and the size of the sheets according to the size of the available local memory. We deduce the total size of the tree, which allows the compaction of the data and their static allocation. Indeed, the graphics processing units do not generally allow the dynamic allocation of memory. The final tree is strongly balanced because the filling of the leaves and the distribution of the levels are maximal, which makes it possible to increase the degree of parallelism and thus the final performance. Indeed, it is important for a good exploitation of the parallelism that the data are distributed homogeneously in the tree. Each node has a collection of points that must be distributed between the left and the right wires. At the root corresponds the entire collection.

La répartition se fait ensuite selon une dimension de l'espace des attributs, et ceci séquentiellement pour chaque dimension jusqu'à obtenir le remplissage souhaité pour les feuilles. Cette dimension dépend de la profondeur. Par exemple, la répartition à la racine 2.1 peut se faire selon la première dimension, typiquement la dimension géographique des X. La répartition des noeuds de profondeur 1, c'est-à-dire les noeuds 2.2 et 2.3, peut se faire selon la dimension géographique des Y. La répartition des noeuds de profondeur 2, c'est-à-dire les noeuds 2.4 à 2.7, peut se faire selon la dimension géographique des Z. La répartition des noeuds de profondeur 3, c'est-à-dire les noeuds 2.8 à 2.15, peut se faire selon la première dimension sémantique, par exemple le type de route, et ainsi de suite. La répartition se fait selon la valeur médiane de la collection à répartir dans la dimension sélectionnée. Les points de la collection à répartir dont la valeur dans la dimension donnée est inférieure à la valeur médiane sont affectés au fil gauche. Ceux dont la valeur est supérieure à la médiane sont affectés au fil droit. Avantageusement, on mémorise au niveau du père deux points définissant la fenêtre englobante des points de la collection à répartir. Par fenêtre englobante s'entend un ensemble de bornes, typiquement deux, pour chaque dimension des données stockées de façon à constituer un hypercube englobant les données. Dans le cas d'attributs géographiques ou géométriques, ces bornes correspondent à des limites spatiales. Le point minimum contient dans chaque dimension la valeur minimum de l'attribut pour l'ensemble des points. Le point maximum contient dans chaque dimension la valeur maximum de l'attribut pour l'ensemble des points. Cela permet d'accélérer les recherches. Pour exploiter au mieux le parallélisme, il est avantageux de pouvoir saturer la mémoire locale des processeurs pour effectuer les traitements. Il est donc contre- productif de pousser la division de l'arbre jusqu'aux feuilles ultimes ne contenant qu'un point de la collection de données. Au contraire, il est avantageux de stopper la division de l'arbre lorsqu'un noeud se trouve gérer une collection de points dont la taille en mémoire équivaut à la taille de la mémoire locale des processeurs. De cette façon, cette collection peut être intégralement chargée en mémoire locale pour les différents traitements parallèles devant être faits. Alternativement, on définit la taille de la collection ultime comme étant le nombre de processus légers gérés par un processeur. Cette alternative conduit à effectuer des traitements où chaque processus léger gère un point de la collection à la fois. The distribution is then made according to a dimension of the space of the attributes, and this sequentially for each dimension until obtaining the desired filling for the sheets. This dimension depends on the depth. For example, the distribution at the root 2.1 can be done according to the first dimension, typically the geographical dimension of the X. The distribution of the nodes of depth 1, that is to say the nodes 2.2 and 2.3, can be done according to the geographical dimension of the Y. The distribution of the nodes of depth 2, that is to say the nodes 2.4 to 2.7, can be done according to the geographical dimension of the Z. The distribution of the nodes of depth 3, that is to say say the nodes 2.8 to 2.15, can be done according to the first semantic dimension, for example the type of road, and so on. The distribution is made according to the median value of the collection to be distributed in the selected dimension. The points of the collection to be distributed whose value in the given dimension is less than the median value are assigned to the left wire. Those whose value is greater than the median are assigned to the right thread. Advantageously, at the father's level, two points defining the bounding window of the points of the collection to be distributed are memorized. By bounding window is meant a set of terminals, typically two, for each dimension of the stored data so as to constitute a hypercube encompassing the data. In the case of geographic or geometric attributes, these boundaries correspond to spatial boundaries. The minimum point contains in each dimension the minimum value of the attribute for all points. The maximum point contains in each dimension the maximum value of the attribute for all points. This speeds up searches. To make the most of parallelism, it is advantageous to be able to saturate the local memory of the processors to perform the processing. It is therefore counterproductive to push the division of the tree to the ultimate leaves containing only one point in the data collection. On the contrary, it is advantageous to stop the division of the tree when a node is managing a collection of points whose size in memory is equivalent to the size of the local memory of the processors. In this way, this collection can be fully loaded into local memory for the different parallel processing to be done. Alternatively, we define the size of the ultimate collection as the number of lightweight processes managed by a processor. This alternative leads to performing treatments where each lightweight process manages one point in the collection at a time.

L'exemple de réalisation de l'invention propose un algorithme de création de l'arbre kd qui ne nécessite pas un tri de toutes les données lors de chaque étape et qui de plus est parallélisable. Le procédé proposé consiste à effectuer une étape préliminaire de tri par dimension avant tout calcul. Pour ce faire, on mémorise dans un vecteur de rang de la même longueur que la collection de points, pour chaque point, le rang qu'aurait ce point dans le vecteur trié. Un tel vecteur de rang est créé pour chaque dimension de la collection de points. La collection de points elle-même n'est pas réordonnancée lors de cette étape préliminaire ni lors de toutes les étapes de la construction et de la mise à jour de l'arbre, ce qui permet avantageusement d'éviter le coût de manipulation des données propres. Ceci est illustré par la Fig. 3. Cette figure illustre une collection 3.1 de points rangés dans un tableau, chaque point ayant 4 attributs rangés dans le sens horizontal 3.2, la collection comprenant ici 8 points rangés dans le sens vertical 3.3. Pour chaque dimension, on crée un vecteur 3.4, 3.5, 3.6 et 3.7 dans lequel sont rangés les index qu'auraient les points correspondants si la collection était triée selon la dimension correspondante. Typiquement, le vecteur 3.4 contient le rang des points de la collection si elle était triée selon la première dimension, le vecteur 3.5 contient le rang des points de la collection si elle était triée selon la seconde dimension, le vecteur 3.6 contient le rang des points de la collection si elle était triée selon la troisième dimension et le vecteur 3.7 contient le rang des points de la collection si elle était triée selon la quatrième dimension. Un vecteur supplémentaire d'index 3.8 est également constitué. Ce vecteur d'index est initialisé avec les index croissants de 1 à N des points de la collection. En effet, lors des réordonnancements des vecteurs de rang relatifs à la construction de l'arbre kd, on réordonnance avantageusement ce vecteur d'index pour éviter de réordonnancer le tableau 3.1 qui contient la collection de points. L'algorithme de tri utilisé pour cette étape préliminaire doit avantageusement être choisi pour son efficacité sur une architecture massivement parallèle telle que celle des unités de traitement graphique. On peut utiliser une méthode apparentée au tri rapide (quicksort en anglais), qui permet de calculer le k1e'ne plus petit élément d'un vecteur. Cette méthode est due à Mr Hoare et de complexité o(n.log(n)). Une autre méthode plus efficace appelée médiane des médianes peut également être utilisée, elle est d'une complexité en o(n). Un tri bitonique parallèle peut avantageusement être aussi réalisé. On doit ensuite réaliser une étape de construction de l'arbre kd représentant la collection de points. Cette construction nécessite de parcourir l'arbre à partir de la racine pour répartir la collection sur les deux fils de chaque noeud. Nous avons vu que la répartition se fait en fonction d'une des dimensions, c'est-à-dire d'un des attributs des points à répartir. La dimension choisie dépend de la profondeur dans l'arbre du noeud à traiter. Traditionnellement, cette répartition implique un tri des points à répartir en fonction de la dimension courante lors de l'opération de chaque noeud et donc un tri de l'ensemble des points pour chaque profondeur de l'arbre. Avantageusement, la solution proposée ici consiste à utiliser les vecteurs de rang créés lors de l'étape préliminaire pour répartir les points en fonction de leur position par rapport à la médiane dans la dimension choisie. Les points dont la valeur de l'attribut dans la dimension courante est inférieure à la valeur de la médiane sont affectés au fil gauche tandis que les points dont la valeur de l'attribut dans la dimension courante est supérieure à la valeur de la médiane sont affectés au fil droit. Les points affectés à chaque fil ne sont pas triés selon la dimension courante entre eux, le tableau stockant la collection de points n'étant pas modifiée. La répartition se fait en trois étapes pour chaque noeud. Lors d'une première étape, on répartit les index du vecteur de rang de la dimension considérée en fonction de la médiane. 10 Dans une seconde étape on répercute cette répartition sur l'ensemble des vecteurs de rang ainsi que sur le vecteur d'index. Lors d'une troisième étape, on renumérote les index des points dans la dimension suivante pour chaque demi-vecteur pour préparer la découpe selon la médiane de ces demi-vecteurs lors de l'étape suivante. The exemplary embodiment of the invention proposes an algorithm for creating the kd tree which does not require a sort of all the data during each step and which, moreover, is parallelizable. The proposed method involves performing a preliminary step of sorting by dimension before any calculation. To do this, we memorize in a row vector of the same length as the collection of points, for each point, the rank that this point would have in the sorted vector. Such a rank vector is created for each dimension of the point collection. The collection of points itself is not reordered during this preliminary step or during all stages of the construction and updating of the tree, which advantageously avoids the cost of data manipulation. own. This is illustrated in FIG. 3. This figure illustrates a collection 3.1 of points arranged in a table, each point having 4 attributes arranged in the horizontal direction 3.2, the collection here comprising 8 points arranged in the vertical direction 3.3. For each dimension, we create a vector 3.4, 3.5, 3.6 and 3.7 in which are ranked the indexes that would have the corresponding points if the collection was sorted according to the corresponding dimension. Typically, the vector 3.4 contains the rank of the points of the collection if it was sorted according to the first dimension, the vector 3.5 contains the rank of the points of the collection if it was sorted according to the second dimension, the vector 3.6 contains the rank of the points of the collection if it was sorted according to the third dimension and the vector 3.7 contains the rank of the points of the collection if it was sorted according to the fourth dimension. An additional vector of index 3.8 is also constituted. This index vector is initialized with indexes increasing from 1 to N points of the collection. In fact, during reordering of the rank vectors relative to the construction of the tree kd, this index vector is advantageously reordered so as to avoid reordering the table 3.1 which contains the collection of points. The sorting algorithm used for this preliminary step must advantageously be chosen for its efficiency on a massively parallel architecture such as that of the graphics processing units. A quicksort-like method can be used to calculate the smallest element of a vector. This method is due to Mr Hoare and of complexity o (n.log (n)). Another more effective method called median medians can also be used, it is of complexity in o (n). Parallel bit sorting can advantageously also be realized. We must then carry out a construction step of the tree kd representing the collection of points. This construction requires traversing the tree from the root to spread the collection over the two threads of each node. We have seen that the distribution is made according to one of the dimensions, that is to say of one of the attributes of the points to be distributed. The chosen dimension depends on the depth in the tree of the node to be treated. Traditionally, this distribution involves a sorting of the points to be distributed according to the current dimension during the operation of each node and thus a sorting of the set of points for each depth of the tree. Advantageously, the solution proposed here is to use the rank vectors created during the preliminary step to distribute the points according to their position relative to the median in the chosen dimension. Points whose attribute value in the current dimension is less than the value of the median are assigned to the left-hand thread while points whose value of the attribute in the current dimension is greater than the value of the median are assigned to the right thread. The points assigned to each thread are not sorted according to the current dimension between them, the table storing the collection of points not being modified. The distribution is done in three stages for each node. In a first step, the indexes of the rank vector of the dimension considered are distributed according to the median. In a second step, this distribution is reflected on the set of rank vectors as well as on the index vector. In a third step, the indexes of the points in the following dimension are renumbered for each half-vector to prepare the cutting according to the median of these half-vectors in the next step.

L'algorithme parallèle permettant la répartition d'un vecteur de rang en fonction de la médiane est illustré par la Fig. 4. Un exemple sur un vecteur de rang de longueur 16 est illustré Fig. 5. Au départ, on part d'un vecteur de rang où est rangé pour chaque point l'index qu'il aurait si le tableau était trié selon la dimension considérée. Si le vecteur a une longueur N, on en déduit que le rang de la médiane est N/2. Le but de l'opération est donc de déplacer les valeurs du tableau inférieures à la médiane N/2 dans la première partie du vecteur et les valeurs du tableau supérieures à la médiane N/2 dans la seconde partie du tableau. Dans une solution itérative, on testerait chaque valeur par rapport à la médiane et on la rangerait dans la partie qui convient. Pour exploiter le parallélisme, on définit la notion de vecteur de drapeaux. Un vecteur de drapeaux est un vecteur binaire. Chaque élément a une valeur binaire zéro ou un, qui stocke le résultat d'un test sur la valeur correspondante du vecteur à répartir. Le vecteur de départ est référencé 5.1. On calcule un premier vecteur de drapeaux 5.2 lors de l'étape 4.1. Ce vecteur représente le résultat de la comparaison de chaque rang avec le rang de la médiane, 8 dans l'exemple. Le vecteur de drapeaux 5.2 contient un 1 si le rang correspondant est inférieur à la médiane et un 0 sinon. Lors de l'étape 4.2 on créé également le vecteur 5.4 qui contient les valeurs inverses. Lors d'une étape 4.3 on crée les vecteurs des sommes préfixes de ces deux vecteurs de drapeaux. Le vecteur 5.3 est la somme préfixe du vecteur 5.2, le vecteur 5.5 est la somme préfixe du vecteur 5.4. Lors de l'étape 4.4 on ajoute la médiane, ici 8, au second vecteur préfixe 5.5 pour obtenir le vecteur 5.6. Lors d'une étape 4.5 on réordonne les index inférieurs à la médiane. Pour ce faire, on déplace tous les index dont la valeur dans le premier vecteur de drapeaux 5.2 est à un dans le vecteur final au rang dans ce vecteur final donné par le premier vecteur préfixe 5.4. Lors d'une étape 4.6 on réordonne les index supérieurs à la médiane. Pour ce faire on déplace tous les index dont la valeur dans le second vecteur de drapeaux 5.4 est à un dans le vecteur final au rang dans ce vecteur final donné par le second vecteur préfixe 5.6. On obtient un vecteur 5.7 où tous les index inférieurs à la médiane sont dans la première moitié du vecteur et les index supérieurs à la médiane sont dans la partie supérieure du vecteur. On a bien obtenu la découpe selon la médiane du vecteur de rang 5.1 du départ. Lorsque l'on a effectué cette découpe, il est nécessaire de répercuter le réordonnancement sur les autres vecteurs de rang 3.4 à 3.7 ainsi que sur le vecteur d'index 3.8 11 pour maintenir la cohérence des données. On applique alors les étapes de réordonnancement 4.5 et 4.6 à ces vecteurs en gardant les mêmes vecteurs de drapeaux et les mêmes vecteurs préfixes que pour le vecteur de rang découpé. Lorsque l'on a terminé les découpes pour une profondeur donnée de l'arbre kd et donc pour une dimension donnée, il faut commencer les découpes suivant la dimension suivante pour la profondeur suivante des données. Pour ce faire, il est nécessaire de reproduire les opérations de découpe précédemment décrites mais sur des vecteurs ayant une longueur inférieure de moitié à la longueur des vecteurs précédents. Ceci est illustré à la Fig. 6. Le vecteur 6.1 représente sur sa première ligne un vecteur de rang selon une première dimension X. Ce vecteur reprend les valeurs de la Fig. 5. Sur la deuxième ligne est représenté le vecteur de rang selon une seconde dimension Y. Ce second vecteur de rang contient lui aussi des index compris entre 0 et N-1. Lorsqu'intervient la découpe du premier vecteur selon X et la répercussion sur le second vecteur selon Y tel que décrit ci-dessus, on aboutit au vecteur 6.2 dont la première ligne correspond au résultat pour le vecteur selon X, c'est-à-dire au vecteur 5.7. La seconde ligne correspond au vecteur selon Y ayant subi le même réordonnancement. On va donc devoir refaire une découpe et une répartition du vecteur selon Y pour chacun des demi-vecteui s de lungueui Nf2, le demi-vecteur contenant les valeurs 11 à 0 et le demi-vecteur contenant les valeurs 14 à 10. On constate que l'algorithme décrit ci-dessus assume que les valeurs à répartir sont comprises entre la valeur 0 et la longueur du vecteur -1. Ce n'est pas le cas ici, car nous avons des vecteurs de longueur N/2 à répartir qui contiennent des valeurs entre 0 et N-1. Il faut donc renuméroter les index sans en changer l'ordre pour obtenir des vecteurs de taille N/2 contenant des valeurs consécutives entre 0 et N/2-1. Les opérations à suivre doivent être effectuées pour chaque vecteur de longueur N/2. On commence par une étape de calcul d'un vecteur de drapeaux de longueur N. C'est le vecteur 6.3 qui pour chaque index i du vecteur de drapeaux contient un 1 si l'index est présent dans le vecteur de longueur N/2. Par exemple, la première valeur du tableau de drapeaux 6.3 est à 1, car l'index 0 est présent dans le premier demi-vecteur selon Y du vecteur 6.2. La troisième valeur est 0, car l'index 2 n'est pas présent dans le premier demi-vecteur selon Y du vecteur 6.2. On calcule alors la somme préfixe 6.4 de ce vecteur de drapeaux. On obtient alors le tableau renuméroté 6.5 en prenant pour chaque index la valeur du tableau des sommes préfixes 6.4 ayant pour index la valeur contenue dans le tableau initial. Par exemple pour l'index 0, la valeur du tableau de départ d'index 0 est 11 ; on affecte donc au résultat d'index 0 la valeur 6 correspondant à l'index 11 dans le tableau préfixe 6.4. De même, pour l'index 1, la valeur du tableau de départ d'index 1 est 9 ; on affecte donc au résultat d'index 1 la valeur 5. correspondant à l'index 9 dans le tableau préfixe 6.4. Il faut faire de même pour la seconde moitié du vecteur 6.5. Il faut calculer un nouveau vecteur de drapeaux et en faire la somme préfixe pour obtenir le résultat illustré par le vecteur 6.5. On constate que le vecteur résultat 6.5 reprend les index du vecteur 6.2 selon Y dans le même ordre, mais avec des valeurs consécutives entre 0 et 7. Une fois les demi-vecteurs renumérotés, on peut passer à l'étape suivante de découpe de chacun des demi-vecteurs et ainsi de suite. Une construction classique d'arbre kd impliquerait de continuer la découpe jusqu'au niveau des feuilles, c'est-à-dire que les feuilles ne contiendraient qu'un seul élément de la collection de points. Pour exploiter au mieux le parallélisme de l'unité de traitement graphique, il est au contraire plus avantageux d'arrêter la découpe plus tôt. Dans l'exemple de réalisation, la découpe est arrêtée dès que le noeud est associé à une collection de points dont la taille en mémoire est inférieure à la taille de la mémoire locale des processeurs scalaires. Les feuilles de l'arbre contiennent donc une collection de points qui peut tenir intégralement dans cette mémoire locale pour le calcul. Alternativement, on peut aussi choisir une taille égale au nombre de processus légers qui peuvent être exécutés simultanément par le processeur scalaire. Ce nombre tient également typiquement dans la mémoire locale et est typiquement inférieur au nombre de points obtenu par la première alternative. De cette façon, les traitements peuvent être implémentés de telle sorte qu'un processus léger s'occupe des calculs concernant un seul point de la collection. Les opérations de recherche dans la collection consistent à sélectionner les points dont les attributs répondent à un critère. Typiquement, la recherche s'apparente à déterminer les points présents dans un hypercube de dimension N, N étant le nombre d'attributs. Cet hypercube est défini par deux points de dimension N. Cette recherche se fait par un parcours de l'arbre à partir de la racine. Ce parcours s'effectue profondeur par profondeur jusqu'aux feuilles. A chaque profondeur, on élimine les noeuds qui n'ont pas d'intersection dans la dimension associée à la profondeur avec l'hypercube recherché. Nous avons vu qu'avantageusement on mémorise au niveau du père deux points définissant la fenêtre englobante des points de la collection à répartir. Ces points sont utilisés ici pour déterminer si cette collection a une intersection avec l'hypercube. On continue la recherche en testant dans la profondeur suivante les fils des noeuds n'ayant pas été éliminés. Cette recherche s'effectue jusqu'aux feuilles. On obtient alors la liste des feuilles ayant une intersection avec l'hypercube recherché. Il suffit alors de tester les points de ces feuilles pour obtenir ce résultat. Cette recherche est d'autant plus efficace que les points de la feuille sont transférables dans la mémoire locale des processeurs scalaires. Cette recherche peut se faire selon le procédé suivant : Dans une première étape 7.1, on initialise un vecteur de longueur égale au nombre de feuilles dans l'arbre kd à 0, on met seulement la valeur d'index 0 à 1. Ce vecteur représente les noeuds à rechercher lors de l'opération courante et donc pour une profondeur donnée. Cette initialisation consiste à initier la recherche avec la racine. Dans une seconde étape 7.2, on teste les noeuds à 1 pour la profondeur courante en commençant par la racine. S'ils ont une intersection avec l'hypercube, on met à 1 les index des fils du noeud sauf si c'est une feuille. Si le père est d'index n dans le tableau initial, les fils ont les index 2.n et 2.n+1 dans le tableau résultant. Dans une troisième étape 7.3, on calcule dans un vecteur auxiliaire la somme préfixe du vecteur pour obtenir dans son dernier élément le nombre de noeuds restant à rechercher. Si ce nombre est nul, on s'arrête, sinon on reboucle à la seconde étape pour la profondeur suivante. The parallel algorithm for distributing a rank vector as a function of the median is shown in FIG. 4. An example on a length rank vector 16 is shown in FIG. 5. Initially, we start from a rank vector where the index for each point is indexed if the table was sorted according to the dimension considered. If the vector has a length N, we deduce that the rank of the median is N / 2. The purpose of the operation is therefore to move the values of the table lower than the median N / 2 in the first part of the vector and the values of the table higher than the median N / 2 in the second part of the table. In an iterative solution, each value would be tested against the median and stored in the appropriate portion. To exploit the parallelism, one defines the notion of vector of flags. A vector of flags is a binary vector. Each element has a zero or one binary value, which stores the result of a test on the corresponding value of the vector to be distributed. The starting vector is referenced 5.1. A first flag vector 5.2 is calculated in step 4.1. This vector represents the result of the comparison of each rank with the rank of the median, 8 in the example. The flags vector 5.2 contains a 1 if the corresponding rank is less than the median and a 0 otherwise. In step 4.2, vector 5.4 is also created which contains the inverse values. In step 4.3, the vectors of the prefix sums of these two flag vectors are created. The vector 5.3 is the prefix sum of the vector 5.2, the vector 5.5 is the prefix sum of the vector 5.4. In step 4.4, the median, here 8, is added to the second prefix vector 5.5 to obtain the vector 5.6. In step 4.5 the indexes below the median are reordered. To do this, we move all the indexes whose value in the first flag vector 5.2 is one in the final vector to rank in this final vector given by the first prefix vector 5.4. In step 4.6, the upper indexes are reordered to the median. To do this, we move all the indexes whose value in the second flag vector 5.4 is one in the final vector to rank in this final vector given by the second prefix vector 5.6. A vector 5.7 is obtained where all the indexes lower than the median are in the first half of the vector and the indices higher than the median are in the upper part of the vector. We obtained the cut according to the median of the vector of rank 5.1 of the departure. When this cutting is done, it is necessary to reflect the reordering on the other vectors of rank 3.4 to 3.7 as well as on the index vector 3.8 11 to maintain the coherence of the data. The reordering steps 4.5 and 4.6 are then applied to these vectors by keeping the same flag vectors and the same prefix vectors as for the cut rank vector. When we have finished the cuts for a given depth of the tree kd and therefore for a given dimension, we must start the cuts in the following dimension for the next depth of the data. To do this, it is necessary to reproduce the cutting operations described above but on vectors having a length less than half the length of the previous vectors. This is illustrated in FIG. 6. The vector 6.1 represents on its first line a rank vector according to a first dimension X. This vector takes the values of FIG. 5. On the second line is represented the rank vector according to a second dimension Y. This second rank vector also contains indexes between 0 and N-1. When the cutting of the first vector according to X and the repercussion on the second vector according to Y as described above, leads to the vector 6.2 whose first line corresponds to the result for the vector according to X, that is to say say to the vector 5.7. The second line corresponds to the vector according to Y having undergone the same reordering. We will therefore have to redo a cut and a distribution of the vector according to Y for each of the half-lives of lungueui Nf2, the half-vector containing the values 11 to 0 and the half-vector containing the values 14 to 10. It can be seen that the algorithm described above assumes that the values to be distributed are between the value 0 and the length of the vector -1. This is not the case here, because we have vectors of length N / 2 to be distributed which contain values between 0 and N-1. It is therefore necessary to renumber the indexes without changing their order to obtain vectors of size N / 2 containing consecutive values between 0 and N / 2-1. The operations to be followed must be performed for each vector of length N / 2. We start with a step of calculating a vector of flags of length N. It is the vector 6.3 which for each index i of the vector of flags contains a 1 if the index is present in the vector of length N / 2. For example, the first value of the flag array 6.3 is at 1, since the index 0 is present in the first half-vector according to Y of the vector 6.2. The third value is 0, because the index 2 is not present in the first half vector according to Y of the vector 6.2. The prefix sum 6.4 of this flag vector is then calculated. The table renumbered 6.5 is then obtained by taking for each index the value of the table of prefix sums 6.4 having for index the value contained in the initial table. For example, for index 0, the value of index start table 0 is 11; therefore, the index result 0 is assigned the value 6 corresponding to the index 11 in the prefix table 6.4. Similarly, for index 1, the value of the index start table 1 is 9; we thus assign to the result of index 1 the value 5. corresponding to the index 9 in the table prefix 6.4. The same must be done for the second half of vector 6.5. A new flag vector must be calculated and prefixed to obtain the result illustrated by the vector 6.5. It can be seen that the result vector 6.5 takes the indices of the vector 6.2 according to Y in the same order, but with consecutive values between 0 and 7. Once the half-vectors have been renumbered, it is possible to proceed to the next step of cutting each of them. half-vectors and so on. A typical kd tree construction would involve continuing cutting to leaf level, that is, leaves would contain only one element of the point collection. To exploit the parallelism of the graphics processing unit, it is more advantageous to stop cutting earlier. In the exemplary embodiment, the cut is stopped as soon as the node is associated with a collection of points whose memory size is smaller than the size of the local memory of the scalar processors. The leaves of the tree thus contain a collection of points that can fit entirely in this local memory for calculation. Alternatively, one can also choose a size equal to the number of light processes that can be executed simultaneously by the scalar processor. This number also typically holds in the local memory and is typically less than the number of points obtained by the first alternative. In this way, the processing can be implemented in such a way that a light process handles the calculations for a single point in the collection. Search operations in the collection consist of selecting the points whose attributes meet a criterion. Typically, the search is similar to determining the points present in a hypercube of dimension N, N being the number of attributes. This hypercube is defined by two points of dimension N. This search is done by a route of the tree from the root. This journey takes place depth by depth to the leaves. At each depth, knots that have no intersection in the dimension associated with the depth with the hypercube searched for are eliminated. We have seen that advantageously we memorize at the level of the father two points defining the bounding window of the points of the collection to be distributed. These points are used here to determine if this collection has an intersection with the hypercube. We continue the search by testing in the next depth the threads of the nodes that have not been eliminated. This search goes to the sheets. The list of leaves intersecting with the desired hypercube is then obtained. It is then sufficient to test the points of these sheets to obtain this result. This search is all the more effective as the points of the sheet are transferable in the local memory of the scalar processors. This search can be done according to the following method: In a first step 7.1, we initialize a vector of length equal to the number of leaves in the tree kd to 0, we put only the index value 0 to 1. This vector represents the nodes to search during the current operation and therefore for a given depth. This initialization consists of initiating the search with the root. In a second step 7.2, the nodes are tested at 1 for the current depth starting with the root. If they have an intersection with the hypercube, the indexes of the threads of the node are set to 1 unless it is a leaf. If the father is index n in the initial array, the sons have the indexes 2.n and 2.n + 1 in the resulting array. In a third step 7.3, the prefix sum of the vector is calculated in an auxiliary vector to obtain in its last element the number of nodes remaining to be searched. If this number is zero, we stop, otherwise we loop back to the second step for the next depth.

Dans une quatrième étape 7.4, on teste les points des feuilles sélectionnées pour obtenir le résultat. Dans une utilisation de ces procédés pour l'implémentation d'un système d'information géographique, les résultats de la recherche sont typiquement des primitives graphiques devant être affichées à l'écran. On utilise alors typiquement l'unité de traitement graphique pour l'affichage des résultats sous la forme d'une fenêtre contenant la représentation graphique d'une zone géographique. Toutefois, les unités de traitement graphiques permettent aujourd'hui de transférer des informations de la mémoire globale du GPU vers la mémoire du processeur central de l'ordinateur. Il est donc tout à fait envisageable d'utiliser les procédés décrits dans tout type de bases de données et d'exploiter les résultats de la recherche sur le processeur central. Les procédés ne sont donc pas limités à un usage dans les systèmes d'information géographique, mais sont utilisables pour l'implémentation de bases de données généralistes. Tous ces algorithmes sont parallélisables avec efficacité sur une unité de traitement graphique et permettent d'obtenir un niveau de performance plus élevé que sur un processeur classique. L'exemple de réalisation montre un gain d'un facteur 10 à 20 par rapport à une implémentation classique sur un processeur central. In a fourth step 7.4, the points of the selected sheets are tested to obtain the result. In using these methods for the implementation of a geographic information system, the search results are typically graphical primitives to be displayed on the screen. The graphic processing unit is then typically used for the display of the results in the form of a window containing the graphical representation of a geographical area. However, the graphics processing units now allow to transfer information from the global memory of the GPU to the memory of the central processor of the computer. It is therefore quite possible to use the methods described in any type of database and exploit the results of the search on the central processor. The methods are therefore not limited to use in geographical information systems, but can be used for the implementation of general databases. All these algorithms are efficiently parallelizable on a graphics processing unit and allow to obtain a higher level of performance than on a conventional processor. The exemplary embodiment shows a gain of a factor of 10 to 20 compared to a conventional implementation on a central processor.

Claims

CLAIMS1 / A method for storing a collection of points of a multidimensional space in memory of an information processing device, characterized in that it comprises: a preliminary step of creating a rank vector by dimension and an index vector, said points of the collection being stored in an array, said index vector containing the index in the array of each point, each rank vector containing the index of the corresponding point in a table where the points would be sorted according to the corresponding dimension; a step of constructing a tree kd representing said collection of points, the points being distributed at each node in values lower and greater than the median of the values according to one of the dimensions, the distribution being made on the rank vectors and the vector index, the points assigned to each wire not being sorted 15 them, the table storing the collection of points is not changed. 2 / A method according to claim 1, characterized in that the step of construction of the kd tree comprises: a step of distributing the values of a rank vector in lower values and in values greater than the median; a step of passing said distribution over the other rank vectors and over the index vector; a step of renumbering the values distributed within each semivector within another vector of rank corresponding to the dimension in which the next step of distribution is to be performed. 3 / A method according to claim 2, characterized in that the distribution step comprises: - a step (4.1) for calculating a second vector of binary flags of the same length as the vector to be distributed, each value of which is the result of the comparison of the corresponding value of the table with the value of the index of the median; a step (4.2) of calculating a second vector of binary flags of the same length as the vector to be distributed, each value of which is the inverse of the corresponding value of the first vector of flags; a calculation step (4.3); prefix sums of these two flag vectors; a step (4.4) of adding the value of the index of the median to all the values of the prefix sum of the second vector of flags; a step (4.5) of reordering the indexes below the median to the index corresponding to the value of the first prefix sum; a step (4.6) of reordering the indexes greater than the median to the index corresponding to the value of the second prefix sum. 4 / A method according to claim 2, characterized in that the step of renumbering comprises, the vector to be renumbered (6.2) being of length N / 2 and containing values between 0 and N-1: - a step of construction of a flag vector (6.3) of length N containing for each index a Boolean value 1 if the index is present in the array and 0 otherwise; a step of constructing the prefix sum (6.4) of said flag vector; a step where each value of the initial array (6.2) is replaced by the value of the array of prefix sums (6.4) whose index corresponds to said value of the initial array. 5 / A method according to one of claims 1 to 4, characterized in that, said information processing device comprising a plurality of scalar processors having a local memory, the step of construction of the tree kd is interrupted as soon as the collection of points associated with a leaf of the tree has a memory size smaller than the size of this local memory. 6 / A method according to one of claims 1 to 5, characterized in that each node of the tree stores two points representing a hypercube encompassing the collection of points associated with this node. 7 / A method of searching, within a collection of points of a multidimensional space, points belonging to a hypercube of said space, the collection of points being stored in the form of a tree kd, characterized in that it comprises a step (7.1) of initialization of a vector of length equal to the number of leaves in the tree kd to zero, the first value being set to one, this vector representing for a given depth of the tree to indices of the knots to be sought; a test step (7.2) for each node whose index is at 1 for the current depth starting with the root, the indices of the threads of the tested node are set to 1, the intersection of the hypercube encompassing the points associated with this node 10 having a non-zero intersection with the sought hypercube; a step (7.3) where it is tested whether there remain nodes to search, if so we loop back to the test step for the next depth; a step (7.4) in which the points associated with the selected sheets are tested to obtain the result. 8 / A database management system comprising a graphics processing unit and means for storing a collection of points of a multidimensional space in memory of an information processing device characterized in that it comprises: - means for creating a rank-by-dimension vector and an index vector, said collection points being stored in an array, said index vector containing the index in the array of each dot, each vector a row containing the index of the corresponding point in a table where the points would be sorted according to the corresponding dimension; Means for constructing a tree kd representing said collection of points, the points being distributed at each node in values lower than and greater than the median of the values according to one of the dimensions, the distribution being made on the rank vectors and the vector of index, the points assigned to each wire not being sorted together, the table storing the collection of points is not changed. 9 / A database management system comprising a graphics processing unit and means for searching, within a collection of points of a multidimensional space, points belonging to a hypercube of said space, the collection of points being stored in memory in the form of a tree kd, characterized in that it comprises: - means for initializing a vector of length equal to the number of leaves in the tree kd to zero, the first value being set to one, this vector representing for a given depth of the tree at the indices of the nodes to be sought; means for testing for each node whose index is 1 for the current depth, starting with the root, setting the indices of the son of the node under test, the intersection of the hypercube encompassing the points associated with this node; node having a non-zero intersection with the desired hypercube; means for testing whether there are still nodes to search and if so to loop back to the test step for the next depth; means for testing the points associated with the selected sheets to obtain the result.