WO2022175602A1

WO2022175602A1 - Method for geolocating and characterising signalling infrastructure devices

Info

Publication number: WO2022175602A1
Application number: PCT/FR2021/050290
Authority: WO
Inventors: Romain D'ESPARBES; Cédrik FERRERO
Original assignee: Geosat
Priority date: 2021-02-18
Filing date: 2021-02-18
Publication date: 2022-08-25

Abstract

The invention relates to a method for geolocating discrete objects from panoramic images, comprising a succession of steps involving capturing the panoramic images by means of one or more moving vehicles carrying a panoramic camera device, a succession of steps involving recognising and classifying objects in the panoramic images, a succession of steps involving geolocating the classified objects using a method for triangulating the objects based on at least two panoramic images containing the classified objects and the positions (100a, 100b) of the panoramic cameras while capturing the panoramic images containing the classified objects.

Description

The present invention relates to a method for the geolocation and qualification of signaling infrastructure devices comprising detection and classification of discrete objects such as infrastructure devices traffic signs and in particular traffic signs with a view to integrating this data into a high resolution map. Technical Field [0002] The invention relates to the field of producing high-definition 3D maps for the geolocation of autonomous vehicles by combining GPS/GNSS data, a high-definition map and movement data, in particular by inertial measurement unit. PRIOR ART [0003] An autonomous vehicle must know its position relative to its environment in order to move. Its localization is possible by means of a GPS/GNSS. However, in the absence of GPS/GNSS and in the case of disturbed satellite signals, other methods are necessary to help a vehicle determine its position. A high-definition (HD) on-board map fills in information gaps while providing a reliable description of the vehicle's 3D environment. [0004] Current solutions for generating high-definition 3D maps and covering vast regions are tedious, require significant working time and the volume of data required for these maps remains high. In parallel, automated object recognition in LIDAR images and point clouds has progressed in recent years. [0005] Cartography systems comprise a minimum of three types of data: geolocated data, 3D spatial measurements and 360° visualization with panoramic optical images. High-quality geolocation measurements are the foundation for creating maps accurate and precise. To do this, LIDAR scanners provide dense, high-accuracy 3D point clouds that can represent the morphology of the environment and panoramic images provide complementary information, textures, signage, text and other visual details. [0006] This data together helps to create dense and highly detailed 3D maps. Mobile mapping systems generate more than 1 gigabyte of data per 50m which makes data management and processing complex, requires large capacity data storage platforms and powerful computing and video processing resources for processing data, visualization, navigation and manipulation of this data. [0007] The processing and analysis of mobile mapping data for the production of high definition maps is difficult and time-consuming for several reasons: (1) the heterogeneous and unstructured nature of 3D point clouds, which are a relatively recent technology , makes them difficult to analyze, (2) 2D images when taken alone lack spatial information, (3) most classification and feature extraction methods require significant human analyst intervention, and , (4) the stand-alone software used to make HD maps are inherently designed as vertical solutions (5) the recognition of discrete objects such as signaling devices and their geolocation is complex. [0008] To extract certain objects such as discrete objects, in particular of the panel type or other signaling element, it is preferable to use images rather than a point cloud, in particular because point clouds are not suitable for sign recognition. [0009] This requires recognition of these discrete objects and their geolocation. [0010] For geolocation, it is known for example from document US2018/188060 A1 to combine camera images and depth maps or point clouds from detection and distance measurement sensors such as lidars. Such a technique requires the processing of a large mass of data. [0011] The document WO2018/140656 A1 describes for its part combining panoramic images and distance data obtained by distance detection devices, for example of the lidar or radar type, to geolocate objects. Finally, the document FOOLADGAR FAHIMEH ET AL: "Geometrical Analysis of Localization Error in Stereo Vision Systems", IEEE SENSORS JOURNAL, IEEE SERVICE CENTER, NEW YORK, NY, US, vol.13, no.11, November 1, 2013 (2013-11-01), pages 4236-4246, XP011527941 proposes methods for calculating the position of targets by means of stereo images. Presentation of the invention [0013] The present application deals more particularly with the geolocation of signaling devices and in particular of road signs in images and in particular in panoramic images and proposes a method of geolocation in global marks of objects discrete from images from a spherical and non-stereo imaging system with positioning and geolocation sensors on board a moving vehicle. The method of the invention uses unit panoramic images, that is to say non-stereo, spherical or geolocated semi-spherical images taken successively and combines a classification part of objects based on neural networks and an algorithmic part of calculation position and dimensions of classified objects so as to simplify shooting and reduce the amount of data to be processed. [0015] More specifically, the present invention proposes a process for the geolocation of discrete objects from a succession of geolocated unitary panoramic images taken by a camera device 360° panorama of one or more mobile vehicles on the route of said vehicles, characterized in that it comprises: - at least one succession of steps for taking said unitary semi-spherical or spherical panoramic images by a mobile vehicle carrying a panoramic camera device carrying a geolocation system, - a succession of steps for recognizing and classifying objects, in said unitary panoramic images, carried out by means of deep learning models based on neural networks producing a detection and classification of objects in the panoramic images and producing the definition of selection boxes around subsets of pixels which contain the detected objects in the panoramic images and for each object in each panoramic image, a classification code and the coordinates i (vertical), j (horizontal) in number of pixels of a position marker of the selection frame of width w and ha uter h containing said object in said panoramic image; - a succession of steps for the geolocation of classified objects by a process of triangulation of said objects from at least two of said distinct panoramic images containing said classified objects and positions of the panoramic cameras when taking panoramic images containing said classified objects , said geolocation steps comprising, for a series of panoramic images, - a step of determining, in each panoramic image of said series in which an object classification is present, the angular position with respect to the shooting center detected objects, - at least one change of reference of the coordinates of the objects in the images in terrestrial Cartesian coordinates according to the coordinates of the cameras at the time of the shots, - a projection of cones made from an angle α directed towards the center A of the selection frame of said object, having as its apex the center of the camera device and an angle aperture defined by the radius r estimated per image, - a calculation of intersections of pairs of cones generated in at least two panoramic images comprising said object and taken at at least two different places; so as to calculate by triangulation a spatial distance between the particular object detected and the positions of the centers of the camera device at said at least two different places and to calculate a geolocated position of said particular object detected. The method of the invention combines a classification part based on deep learning and a neural network, then an algorithmic part for calculating the position and confirming the validity of the positions of the objects to geolocate said objects from images. semi-spherical pans over 360 unit degrees without the need for distance measurements or stereo shots. [0016] The camera device carried by the mobile vehicle or vehicles comprising several cameras distributed in a spherical manner suitable for taking synchronized photos, dated and located by means of location and dating such as a GPS/GNSS system, the method can include taking photos periodically by the cameras of said camera device during the movement of the vehicle carrying them, said synchronized, dated and geolocated photos being processed in software suitable for constructing spherical panoramic images from said photos. [0017] The camera device comprising five cameras pointing in directions spaced 72° apart in a horizontal plane around a vertical axis with respect to the rolling plane of the vehicle and a camera pointing upwards on the vertical axis, the photos simultaneous images taken by the 6 cameras are connected to produce a 360° spherical panoramic photo in a horizontal plane. [0018] The method may include the fixing of an origin point with coordinates (0, 0) at the level of the panoramic images of width of W pixels by a height of H pixels, a step of recognizing and classifying said objects at the means of deep learning models based on neural networks producing a selection frame around a subset of pixels which contains the object detected in the panoramic image as well as its classification code and a memorization step: [0019 ] - of said selection frame of each object P detected, [0020] - the height h and the width w of this frame and, [0021] - the coordinates i (vertical), j (horizontal) in number of pixels of a position mark of the selection frame containing said object associated with the panoramic photos with respect to said point of origin. [0022] The method may comprise the definition of the point of origin of coordinates (0, 0) of the panoramic image as the point at the top left of the panoramic image and the definition of the position marker i, j relating to the origin (0, 0) of the bounding box as the top left point of the bounding box. [0023] The triangulation method may comprise for a series of panoramic images, a step of determining the angular position of the objects detected in each panoramic image of said series in which an object classification is present with respect to the center of of view which comprises from the dimensions of the selection frame: - a calculation of the position of the estimated center A of each object in the panoramic image; - a calculation of an estimated radius r of said object such that r is equal to half of the smaller of the width w and height h dimensions of the selection frame in the panoramic image and the position calculation of a point B on the bounding box and closest to point A such that AB=r; - a calculation of the angular position of the center A of the image of said object and of point B with respect to the center O of the group of cameras 100 in the spherical camera coordinates, the panoramic image being assimilated to a sphere with center O then, - the calculation of a vector

joining the center of the cameras O and the center A of the object, the calculation of a vector

joining the center of cameras O and point B and calculating an axis cone

and vertex angle α such that [Math.1]

in Cartesian camera coordinates; - a transformation into the terrestrial Cartesian global coordinates and a memorization of the coordinates of the vectors u^⃗ _^ of the center O of the cameras and of the angle α for each object of each panoramic image. [0024] The step of determining the position of the detected objects can in particular use: the position parameters of the cameras, which include the geolocated Cartesian coordinates (x0, y0, z0) of the center O of the camera device and the orientation (Ω, Φ, Κ) of each of the cameras of the camera device; - the height and width of the original panoramic image H, W; - borders of selection frames containing objects detected in panoramic images. [0025] The panoramic image being considered as the upper part of the surface of a sphere with the camera device at the center of this sphere, the triangulation method preferably comprises the determination of the position of a particular detected object by means of a projection of a cone directed towards the center A of the selection frame of said object on said upper part of said surface of a sphere, said cone having as its apex the center of the camera device and the angle aperture α and comprises , for a given particular detected object, a calculation of intersections of pairs of cones generated in at least two distinct panoramic images comprising said object and taken at at least two different locations U0, V0, so as to calculating by triangulation a spatial distance between the particular object detected and the positions of the centers of the camera device at said at least two different places and calculating a geolocated position of said particular object detected. [0026] Preferably, only the pairs of cones whose distance between their apexes is limited by a distance parameter D are taken into account, the distance parameter D being expressed in meters and chosen according to the spatial frequency of taking photos to reduce the number of false detections. [0027] The method advantageously comprises a validity analysis of IJ intersections in 3D of said cone projections in order to reduce the number of candidate objects detected and to identify the uniqueness of the objects and to determine whether the intersections are sufficiently close, a validity condition of a candidate object being the fact that the vectors

and

directors of the pairs of cones are separated by a distance less than a parameter d at the intersection IJ of the cones, the method further comprising an extraction as valid objects of representatives satisfying said validity analysis, the storage of their position in the global coordinate system, storing their predicted classification and storing their dimension. The validity analysis advantageously includes a search for intersection nodes and a proximity calculation between the axes of the cones to determine whether the intersections are sufficiently close, by means of the generation of a tree KD of the points I and J and the selection from said tree KD of pairs of cones having a minimum distance less than a parameter d at the intersection IJ of the cones, the value in meters of said parameter d being chosen according to the size of the classes of objects detected , and comprising, for the condition that the intersections are sufficiently close, an algorithm for identifying points I and J of closest distance from the vectors and directors of the pairs of intersecting cones, an algorithm for constructing spheres of influence P , Q whose radius is the radius of the cone according to a perpendicular to the director axis of the cone around the points I and J called attachment points and an algorithm for validating the condition if the points I and J are mutually contained in the sphere of influence of the cone opposite the axes of the cones then being contained in the cone opposite to the intersection. [0029] The method may comprise, for all the pairs of cones whose intersections are sufficiently close and whose direction vectors are mutually contained in the cones and when two or more attachment points have been found, a detection analysis of parasitic intersections, said analysis comprising the production of a graph: - for which each attachment point constitutes a node, - for which two nodes are connected if they satisfy the conditions of sufficiently close intersections and direction vectors mutually contained in the cones, said analysis further comprising for each connected component of the graph, a choice of the node having the maximum degree of links as representative of a single object and a sorting by decreasing degrees of connectivity of said representatives then an iterative process of validation and subtraction of the representative of the highest order and cones converging towards them, analysis of convergences of cones remaining and deletion of the representatives finding themselves without converging cones, said iterative process being reiterated until all the representatives have been validated or invalidated. Brief description of the drawings [0030] Other characteristics, details and advantages of the invention will appear on reading the detailed description below, and on analyzing the appended drawings, in which: [0031] [Fig.1 ] shows an example of a panoramic photograph; [0032] [Fig. 2] shows a camera reference system diagram in spherical coordinates; [0033] [Fig. 3] shows a global reference system diagram in Cartesian coordinates; [0034] [Fig.4] shows an angular camera registration diagram; [0035] [Fig. 5] shows an image diagram projected onto a sphere centered on the cameras; [0036] [Fig. 6] shows panel processing according to one aspect of the application; [0037] [Fig. 7] shows a representation of two solid angles converging towards a triangulated point; [0038] [Fig.8] shows an example of detection of objects on a trajectory; [0039] [Fig. 9] shows a flowchart of the steps of a first application process; [0040] [Fig. 10] shows a flowchart of the steps of a second application method; [0041] [Fig. 11] is a schematic representation of an imaging vehicle [0042] [Fig. 12] shows a graph corresponding to the detection of objects in FIG. 8. Description of the embodiments The drawings and the description below describe non-limiting examples of embodiment useful for understanding the invention. For the generation of HD maps, it is desirable to perform a detection and classification of roads and road signs from panoramic images. The panoramic images 11 as shown in Figure 1 are taken from mobile vehicles 1 as shown schematically in Figure 11 equipped with panoramic cameras 2a, 2b, for example an assembly of cameras, called panoramic camera device, which may in particular comprise five cameras 2a pointing in directions spaced at 72° in a horizontal plane around a vertical axis relative to the rolling plane of the vehicle and a camera 2b pointing upwards on the vertical axis. The cameras take synchronized photos, dated and localized by localization and dating means such as a GPS/GNSS system 3 and an inertial unit. The GNSS/GPS antenna is normally positioned as close as possible to the inertial unit and in the vertical axis of the inertial unit and is found in the immediate vicinity of the camera block The vehicle can also be equipped with a LIDAR device to scan the environment to generate point clouds for making 3D maps. The 6 photos taken by the cameras are connected to produce a 360° panoramic photo in a horizontal plane. [0047] Vertically, the lower part of the photo contains no information due to the field of vision of the cameras but is used to complete the image over 180°. The image is referenced to have a point of coordinates 0, 0 at the top left point and is W pixels wide by H pixels high, for example 1920 pixels in a horizontal direction and 1080 pixels in a vertical direction. [0048] The cameras take photos in a synchronized manner during the movement of the vehicle carrying them, for example every two meters. The unit images are processed in software adapted to construct a panoramic image from the six simultaneous images. [0049] The problems associated with the recognition and positioning of objects such as panels on a map from panoramic images are, on the one hand, knowing whether the images of panels which are repeated in several panoramic images correspond to the same panel or to different panels and, on the other hand, the impossibility of knowing the real position of a panel from its image on a panoramic image. The following description takes the example of traffic signs but the method described applies to any discrete object that we wishes to geolocate, such as traffic lights, bus stop shelters or other discrete objects whose geolocation is desired. The method has also been tested in railway environments to geolocate terminals and devices on the ground. Punctual markings on the ground could also be geolocated by this method, such as the squares of the "give way" lines, the bicycle symbols on the cycle lanes, the manholes. However, the larger the objects to be detected, the less accurate the geolocation. [0052] In general, the necessary information on a panoramic image comprising at least one object to be geolocated is: the predicted class of the detected object, the coordinates in pixels of a selection frame of a sub-image comprising the object to be geolocated, the dimensions h, w of the selection frame of the sub-image, the name of the panoramic image, its orientation and its geolocation. A prior step to the method of the present application is the creation of an MRP object recognition model, such as panels 400, in the form of a neural network by means of a learning base which contains examples of the classes of interest of these objects. This panel recognition model 400 is then used to recognize the panels in panoramic images to be processed. The classes of objects are in particular groupings of types of objects to be found. Depending on the size of the database and the number of samples, a class can contain a type of object but also a family of objects. For example, for signs, a class can contain the type stop sign, but also a type of sign such as direction signs. The granularity of the class may in particular vary according to the quantity of data available or to be processed. According to the present application, the method begins with a succession of steps for taking panoramic images 11 by one or more mobile vehicles 1 carrying panoramic camera devices discussed above. An example of a panoramic image is given in figure 1. In this image this image, the coordinates i, j in pixels of a point P are Cartesian coordinates from an origin point (0, 0) at the top left of the image. [0056] The image also has W pixels in width which corresponds to 360° and H pixels in height over 180° with a lower black band 10 on the part of the image masked by the vehicle. [0057] Once the images have been taken and saved in a database or a file 500, according to FIG. 9, inference steps, by the model on the new images to be processed, carry out recognition and classification of 410 objects in the panoramic images by means of the neural network. This makes it possible to recognize the classes of the detected objects, for example classes of signs, the class of traffic lights, etc. With each object detected in each image, the detected class will be associated. Similarly, a bounding box as shown in Figure 6 discussed below will give the maximum dimensions of the object in the corresponding image. A database or a file 300 with the classified objects in the images and the photos and positions of the cameras is then created. [0059] Next, an important part for the positioning of the objects detected and classified from mobile mapping images in a 3D map is to determine their geolocated coordinates. To do this, the present application proposes performing a triangulation of the objects from two or more detections of these objects. The triangulation performed uses the principle of parallax. A minimum of two images is required for this method to work. For each panel or object detected, it is necessary to determine its position with respect to the center of the group of cameras, with respect to the extrinsic camera parameters and its position in the panoramic image. [0061] Specific information associated with each panoramic image is thus necessary to extract the geolocation of the classified objects. - The camera position parameters, which include the geolocated Cartesian coordinates of the center O of the group of cameras 100 (x0, y0, z0) according to figure 3 and the orientation of each of the six cameras (Ω, Φ, Κ) in the group of 6 cameras 100 according to FIG. 4; - The height and width of the original panoramic image (H, W) of Figure 1; - the borders of all the selection frames 13 containing the objects detected in the panoramic images such as the panel P represented in FIG. 6. For geolocation, several transformations are necessary: - the image pixels must be identified in the spherical camera coordinates (ρ, θ, φ) according to FIG. 2 where the cameras 100 are arranged at the origin point O of the marker; - the spherical camera coordinates must be translated into Cartesian camera coordinates (X, Y, Z) according to FIG. 3 because the coordinates used to locate the vehicle, and therefore the center of the camera device 100, are Cartesian coordinates , and finally ; - the Cartesian camera coordinates must be translated into global coordinates (x0, y0, z0) according to figure 3 (for example the WGS-84 geodetic coordinates). The expected result is the determination of the geolocation of a position P of the objects classified in global coordinates (x, y, z). In the following we will consider a road sign. To geolocate the panels, it is necessary to determine the position of each panel detected with respect to the center of the camera and with respect to the panoramic image. Solid angles 110 are projected from the center of the camera 100 called apex, towards the center A of the selection frame 13 of the detected panel called the sub-image as represented in figure 5. [0065] The term “sub-image” refers to a sub-set of the pixels of the panoramic image including the object to be geolocated, the selection frame framing the sub-image. [0066] The position of the sub-image of the road sign with respect to the center of the camera can be considered as part of the surface of a sphere 120 (the panoramic image) with the camera device 100 at the center of this sphere. . For each sub-image, solid angles, cones 110, are generated as follows. Conservatively, a solid corner opening can then be defined by the radial distance r between the center coordinates, A, of the bounding box and hence the traffic sign, to the nearest edge of the bounding box in Figure 6 according to the relation: [0068] [Math.2] [0069]

[0070] with: [0071] [Math.3] [0072]

Here, (i, j) are the coordinates of the highest pixel on the left of the sub-image 13 encompassing the panel, h and w are respectively the height and the width of the sub-image. The coordinates of point A in the image are vertically Ai=i+h/2 and horizontally Aj=j+w/2. This also allows to define a point B of the bounding box closest to A on the perimeter defined by the radius r. The transformation of the coordinates in pixels of the central point A of the sub-image of coordinates (Ai, Aj) into spherical camera coordinates (partial transformation, because there is no depth dimension) is then carried out as follows: In the horizontal plane the angle φ is: [0076] [Math.4] [0077] modulo 2π

For this formula, since the center of the image corresponds to an angle φ0=0, 2Ajπ/W must be subtracted from π to obtain the value of the angle φ. [0079] In the vertical plane the angle θ is: [0080] [Math.5] [0081] modulo π

In this calculation, the width of the image corresponds to W in pixels and to 2π in spherical coordinates and the height of the image corresponds to H in pixels and to π in spherical coordinates. For each sub-image, solid angles (cones) are generated by assigning a value 1 to the parameter ρ of the vector defining the axis of the cone in the spherical camera coordinates. [0084] To return to the global coordinate system, several transformations are carried out. [0085] The Cartesian camera coordinates are expressed in spherical camera coordinates in the form: [0086] [Math.6] [0087] X = ρ sin(θ) cos(φ), [0088] [Math.7] [0089 ] Y = ρ sin(θ) sin(φ), [0090] [Math.8] [0091] Z = ρ cos(θ) [0092] with ρ=1 two vectors

and B are defined such that using the

angles determined for points A and B we have: [0093] [Math.9] [0094]

[0095] [Math.10] [0096]

[0097] with an origin at (0, 0, 0) center of the set of cameras. [0098] The angle between these two unit vectors makes it possible to calculate the opening of the solid angle from the scalar product of these two vectors such as: [0099] [Math.11]

[0100] Finally, the vector of the cone is transformed into the coordinates

global: [0101] [Math.12]

[0102] Where R is the rotation matrix using the extrinsic parameters of the camera: [0103] [Math.13] [0104]

[0105] or also: [0106] [Math.14]

[0107] and R ^T the transposed matrix. [0108] Thus, for each panoramic image, vectors, starting from the center of the group of cameras (x0,y0,z0) of FIG. 3 in global coordinates at the time of taking this image and pointing towards the center of the panel ( xA,yA,zA) in global coordinates as seen in the panoramic image, create lines oriented in space. Each panoramic image thus produces a cone centered on the line going from the center of this image to point A of an object in this image and with an opening angle α. [0109] For a given class of objects resulting from the classification, a search for intersections of the cones, resulting from a series of panoramic images and trajectory data for the points taken from the images containing the detected objects , is then performed. This makes it possible to calculate a spatial distance between the object or objects detected and the positions of the center of the camera at the time of the photos taken by triangulation. To reduce the number of false detections, only the pairs of cones whose apexes are separated by a distance 204 according to FIG. 8 less than a distance parameter D are taken into account. This search can be carried out by means of a KD tree on the points of the trajectories which makes it possible to accelerate the search for the pairs of cones which are close to each other. Such a KD tree is not mandatory, but it lightens the computation when the amount of data increases. D is expressed in meters and its choice depends on the maximum distance at which it is considered that an object can no longer be detected in an image. Parameter D will be adapted by the operator according to the spatial frequency of the shots. For example for a panel object class, a vehicle taking pictures every 2m, D can be set to 100m. The periodicity of the taking of photos is preferably determined according to a distance traveled by the vehicle, for example the photos are taken every meter or every two meters traveled. In certain configurations, the periodicity can be determined according to a temporal frequency, the vehicle speed then determining the interval in meters between the photos. For a vehicle traveling at 50 km/h and cameras taking photos at 14 fps, in this case we would obtain one photo per meter and a backup of one photo per 2 or 3 meters could be made. FIG. 7 represents two vectors

and

central respectively to the cones 110a, 110b and corresponding to the detection of a panel in two images taken with the cameras in position U0 and in position V0. To reduce the number of candidates detected and identify the uniqueness of an object, the projections of the cones 110a, 110b are analyzed to determine the intersection points in 3D which represent the geolocated position of the objects in space. As the cone vectors are three-dimensional and due to inaccuracies in the measurements, the cone vectors may not intersect perfectly which creates several close intersections. Intersections are considered valid if several conditions are met: i - They must be sufficiently close; ii - The direction vectors are mutually contained in the cones and ; iii - The intersection is not a parasitic intersection. For condition i, to determine if the intersections are close enough, a KD-Tree is generated and only the pairs of cones whose minimum distance 130 between their axes defined by the vectors and is less than a parameter d are considered to satisfy this condition. The value of d in meters is chosen according to the size of the objects of a class of objects detected in the images. For condition ii, according to which the direction vectors are mutually contained in the cones, the points I and J of closest distance are identified along the direction axes of the pairs of intersecting cones and spheres of influence are constructed. P, Q whose radius is the radius of the cone according to a perpendicular to the director axis of the cone around these points called attachment points. The condition is satisfied if the points I and J are mutually contained in the spheres of influence P, Q taken into account and these points I and J are said to be mutually connected. For all the pairs of cones which satisfy the two conditions i and ii, when two or more attachment points have been found, a position in global coordinates is calculated for each attachment point. A line segment IJ orthogonal to the two lines formed by the vectors and directors of each cone connects the two lines formed by

said vectors. The problem amounts to minimizing the distance || J–I || ² of line segment IJ. It is necessary to deduce the position of the attachment points in global coordinates through the equation according to which the scalar product of the two perpendicular vectors is zero. [0118] A general equation expresses the position of 3D points along their respective vector: [0119] [Math.15]

[0120] As solid cones are defined by their apex M0 and a directional point MA, we have: [0121] [Math.16]

[0122] [Math.17]

[0123] A particular value of the variable t which defines the distance of the points on the line defined by the directional vector of the cone will define the position respectively of the point of attachment I or J. [0124] As the scalar product of two vectors directional orthogonal to each other is zero, we have for

and

: [0125] [Math.18]

[0126] [Math.19]

[0127] By rewriting the scalar products with the general vector line equation where: [0128] [Math.20]

[0129] Then by evaluating the scalar product equations with the known points of the vectors

and

and by realizing an equality between the equations, it is possible to solve the equations firstly for tA, then for tB in order to obtain the global coordinates of the attachment points I and J. [0130] Once the parameters tA and tB the radius of the spheres for each vector can be calculated with: [0131] [Math.21]

[0132] [Math.22]

The radius R _F is used to define the sphere of influence at the level of point I, a similar calculation with tb makes it possible to calculate the sphere of influence at the level of point J. The radius Rs is then the dimensional analogue of r in the equation r=(min (w,h)/2 above. Once the intersections of cones at points I and J have been determined, i.e. when the intersection is such that the pairs of spheres for which the distance between their centers is less than d and that these centers are mutually contained in the opposite sphere, it is necessary to verify condition iii - The intersection is not a parasitic intersection or a ghost panel. To do this, a subsequent analysis is carried out to detect unique or parasitic intersections, that is to say false intersections as represented in FIG. 8 representing a simplified case where on the trajectory 200 a vehicle takes photos at the positions 200a, 200b, 200c, 200d, 200e, two panel x 201 and 202, the panel 201 being visible on the photos taken at points 200b, 200c, 200d, 200e while the panel 202 is visible in the photos taken at points 200a, 200b, 200c. We see that in this case, a vector from a photo of panel 201 in position 200b and a vector from a photo of panel 202 taken at point 200c intersect at 203. In this case, the intersection at point 203 corresponds to a ghost panel. This comes from the fact that in reality, a cone can have an intersection with several other cones which generates an attachment point representing the possible position of an object. In such a case, it is necessary to delete the parasitic intersections to keep only the valid intersections at points 201 and 202. To delete the points corresponding to a ghost panel, a graph 600 as represented in FIG. 12 for which each point of attachment Ap constitutes a knot is achieved. The branches Br of the graph are the connections between a node and all the other nodes together forming a connected component. For each connected component, the node Rp having the maximum degree is chosen as representative of a unique object. The representatives are then sorted by decreasing degree of degree of connectivity. To filter out the spurious points, the highest order representatives and the cones converging towards the latter are validated and removed. The remaining cone convergences are analyzed and, if one or more representatives are found without converging cones, these representatives are identified as a projection of a ghost or duplicated panel called a ghost projection and invalidated. The operation is repeated until all the representatives have been validated or invalidated. All representatives that pass the last test are then extracted as valid objects, their position in the global coordinate system is stored along with their predicted classification and dimension. It should be noted that storing the measured dimension then makes it possible to automatically check whether the size of the detected objects is consistent with their theoretical size inherent to their class. For example in FIG. 12, point 201 in FIG. 8 corresponds to an attachment point A1 with a representative Rp1 on the graph 600, point 202 to an attachment bridge A2 with a representative Rp2 and the point 203 to an attachment point A3 with an Rp3 representative. The Rp1 representative with 11 bonds is validated and removed with its cones, then the Rp2 representative with 5 bonds is removed with its cones. There then remains the representative Rp3 which no longer has any cones and is therefore classified as a ghost projection and therefore invalidated. Furthermore, the radius Rs of the highest-order represent makes it possible to evaluate the spatial dimension and the size of the object. To facilitate the elimination of ghost signs due to projections of cones crossing at much further distances or crossing with cones of other traffic signs, it is possible to adapt D the maximum distance between the apex of cones taken into account depending on the density of traffic signs and reduce D inside cities where the density of signs is large compared to scenes outside cities where the density of signs is lower. [0140] Once the traffic sign extraction algorithm has finished processing a panoramic image, the sub-images that could not be triangulated and located are documented in a separate log for manual verification. The traffic sign localization algorithm of the present application depends on fundamental geometric transformations and on a constrained three-dimensional triangulation method. The output data are the geolocated coordinates (x, y, z) of the objects and in particular of the panels in the global coordinates, their predicted classification and a minimum dimension, r. Another point dealt with by the filtering of parasitic points, described above, is the extraction of ghost objects when several cones have an intersection with a single cone. This happens due to a sensitivity of the method to parallax at small angles coupled with a profusion of cones which add background noise. Sufficient distance is needed between the viewpoints (image captures) to clearly distinguish the intersections of the cone vectors during the triangulation. Furthermore, to avoid calculating intersections with cones that are too far apart from each other, the parameter D which reduces the interference between the cones for the instances of panels of the same type that are close can be adapted according to the location of the panels. D can be reduced in cities where the panel density is greater and increased outside cities. Since the cones are projected to infinity, following the path of the vehicle, especially in curves, roundabouts and lane intersections, reducing the search radius reduces background noise. The value of D depends on the size of the objects and the frequency of image capture. The aforementioned distance parameters D and d are introduced during the implementation of the method on an image set corresponding to a surface whose panels or other objects are to be geolocated. All of these operations are summarized in Figure 10 which describes, from the database of images with classified objects 300, the steps 305 of reading the selection frames of the objects and 310 reading the trajectories of the cameras having taken the images, the step 320 of constructing the cones 320 in parallel with the step 330 of creating the KD trees of proximity of the intersections, the step 340 of selecting the objects of a class to carry out the step 350 of searching for the pairs of cones of distance <D with respect to the origin of the photo taking into account the result of the step for creating the tree KD of the trajectory points then the step 355 of searching for the intersection nodes and the calculation of the proximity between the axes of the cones to define the nodes corresponding to probable objects. These steps are followed by a step 360 of calculating the connected components with the generation of a graph 600 of all the node connections then a step 365 of selecting the maximum degree node for each connected component followed by a step 370 of ordering the representative nodes by descending degrees to carry out step 375 of filtering the dummy representatives by deleting the vectors already referenced for higher-order nodes and deleting the remaining single-vector nodes. At the end of this operation, the output data are the objects for which the coordinates of the center of the object in global coordinates, and the radius of the object are stored in step 380. [0149] To summarize, [0150] A - the triangulation method is carried out with the criteria: 1) at least two images where the model has detected the same object, 2) the position in global coordinates of the center of the block of cameras is known 3) the camera orientation angles are known which makes it possible to calculate the position of the object detected in the image in a global frame. [0151] B - The triangulation method projects the directional vectors, with their apex at the center of the block of cameras and searches for the crossings in 3D space as potential positions of the object. [0152] C - The determination of the uniqueness and validity of the object as well as the execution time performance are ensured thanks to the introduction of two parameters: D, a search distance for the pairs of vectors which cross which makes the algorithm more efficient in execution time and d, a distance threshold determining that at the crossing, the vectors are sufficiently close to reduce the candidate points of intersection. [0153] D—The method provides a method of filtering crosses by a voting system to identify among the candidates the most probable position of the objects. [0154] E - After having calculated the number of votes (degree of connectedness of a position) the points are ordered in rank. The projected vectors are attributed to the points by their decreasing degree of connectivity and the points having no more associated vectors are considered as false positives and not real positions. [0155] Once all the classes of objects existing in the objects have been processed, the result is a database 390 of the objects that can be integrated into a 3D map. [0156] The present application thus proposes a highly automated method for the recognition and geolocation of discrete objects such as signs, traffic lights or other infrastructure objects from panoramic images taken by one or more vehicles moving on traffic lanes. The vehicles may in particular be motor vehicles moving on road traffic lanes or railway vehicles for geolocating railway signaling devices.

Claims

Claims [Claim 1] Process for the geolocation of discrete objects from a succession of geolocated unitary panoramic images (11) taken by a 360° panoramic camera device of one or more mobile vehicles on the route of said vehicles, characterized in that it comprises: - at least a succession of steps (510) for taking said unitary semi-spherical or spherical panoramic images by a mobile vehicle carrying a panoramic camera device and carrying a geolocation system, - a succession of steps (410) for recognizing and classifying objects, in said unitary panoramic images, carried out by means of deep learning models based on neural networks producing a detection and classification of objects in the panoramic images and producing the definition of selection boxes around subsets of pixels which contain the objects detected in the panoramic images and for each o bjet in each panoramic image, a classification code and the coordinates i (vertical), j (horizontal) in number of pixels of a position marker of the selection frame of width w and height h containing said object in said panoramic image ; - a succession of steps (305 to 380) for the geolocation of classified objects by a process of triangulation of said objects from at least two of said distinct panoramic images containing said classified objects and positions of the panoramic cameras during the image captures panoramic images containing said classified objects, said geolocation steps comprising, for a series of panoramic images, - a step of determining, in each panoramic image of said series in which an object classification is present, the angular position relative to the shooting center of the detected objects, - at least one change of reference of the coordinates of the objects in the images in terrestrial Cartesian coordinates according to the coordinates of the cameras at the time of the shots, - a projection of cones (110) made from an angle α directed towards the center A of the selection frame of said object, having as its apex the center of the camera device and an angle opening defined by the radius r estimated by image , - a calculation of intersections of pairs of cones (110a, 110b) generated in at least two panoramic images comprising said object and taken at at least two different locations (100a, 100b); so as to calculate by triangulation a spatial distance between the particular object detected and the positions of the centers of the camera device at said at least two different places and to calculate a geolocated position of said particular object detected. [Claim 2] Geolocation method according to claim 1 for which, the camera device carried by the mobile vehicle or vehicles (1) comprising several cameras (2a, 2b) distributed in a spherical manner adapted to take synchronized photos, dated and localized by location and dating means (3) such as a GPS/GNSS system, the method comprises taking photos periodically by the cameras of said camera device while the vehicle carrying them is moving, said photos being synchronized, dated and geolocated being processed in software suitable for constructing spherical panoramic images from said photos. [Claim 3] Geolocation method according to Claim 2, for which the camera device comprising five cameras (2a) pointing in directions spaced apart by 72° in a horizontal plane around a vertical axis with respect to the rolling plane of the vehicle and a camera (2b) pointing upwards on the vertical axis, the simultaneous photos taken by the 6 cameras are connected to produce a 360° spherical panoramic photo in a horizontal plane. [Claim 4] Geolocation method according to any one of the preceding claims comprising the fixing of a point of origin with coordinates (0, 0) at the level of the panoramic images of width of W pixels by a height of H pixels, a step (410) of recognizing and classifying said objects by means of deep learning models based on neural networks producing a selection frame around a sub- set of pixels which contains the object detected in the panoramic image as well as its classification code and a step of memorizing: - said selection frame (13) of each object P detected, - the height h and the width w of this frame and, the coordinates i (vertical), j (horizontal) in number of pixels of a position marker of the selection frame containing said object associated with the panoramic photos with respect to said point of origin. [Claim 5] Geolocation method according to claim 4 comprising the definition of the point of origin of coordinates (0, 0) of the panoramic image as the point at the top left of the panoramic image and the definition of the position marker i, j (13a) relative to the origin (0, 0) of the bounding box as the top left point of the bounding box. [Claim 6] Geolocation method according to claim 4 or 5 for which the triangulation method comprises, for a series of panoramic images, a step of determining the angular position of the objects detected in each panoramic image of the said series in which a classification of object is present with respect to the shooting center which comprises from the dimensions of the selection frame: - a calculation of the position of the estimated center A of each object in the panoramic image; - a calculation of an estimated radius r of said object such that r is equal to half of the smaller of the width w and height h dimensions of the selection frame in the panoramic image and the position calculation of a point B on the bounding box and closest to point A such that AB=r; - a calculation of the angular position of the center A of the image of said object and of point B with respect to the center O of the group of cameras 100 in the spherical camera coordinates, the panoramic image being assimilated to a sphere with center O then, - the calculation of a vector

joining the center of the cameras O and the center A of the object, the calculation of a vector joining the center of the cameras O and the point B and the

calculation of a cone with axis and vertex angle α such that

[Math.23]

in Cartesian camera coordinates; - a transformation into the terrestrial Cartesian global coordinates and a memorization of the coordinates of the vectors u^⃗ _^ of the center O of the cameras and of the angle α for each object of each panoramic image. [Claim 7] Geolocation method according to claim 6 for which the step of determining the position of the detected objects uses: - the position parameters of the cameras, which include the geolocated Cartesian coordinates (x0, y0, z0) of the center O of the camera device and the orientation (Ω, Φ, Κ) of each of the cameras (2a, 2b) of the camera device; - the height and width of the original panoramic image H, W; - the borders of the selection frames (13) containing objects detected in the panoramic images. [Claim 8] Geolocation method according to claim 6 or 7 for which, the panoramic image being considered as the upper part of the surface of a sphere (120) with the camera device (100) at the center of this sphere, the triangulation method includes determining the position of a particular sensed object by means of a projection of a cone (110) directed toward the center A of the bounding box of said object onto said upper portion of said surface of a sphere , said cone having as its apex the center of the camera device and the angle aperture α and comprises, for a given particular detected object, a calculation of intersections of pairs of cones (110a, 110b) generated in at least two panoramic images distinct comprising said object and taken at at least two different locations U0, V0, so as to calculate by triangulation a spatial distance between the particular detected object and the positions of the centers of the device camera at said at least two different locations and calculate a geolocated position of said particular detected object. [Claim 9] Geolocation method according to claim 8 for which only the pairs of cones whose distance between their apexes is limited by a distance parameter D are taken into account, the distance parameter D being expressed in meters and chosen as a function the spatial frequency of taking photos to reduce the number of false detections. [Claim 10] Geolocation method according to claim 8 or 9 for which the method comprises a validity analysis of IJ intersections in 3D of said projections of cones in order to reduce the number of candidate objects detected and to identify the uniqueness of the objects and to determine if intersections are close enough, one condition for the validity of an object candidate being the fact that the vectors

and

directors of the pairs of cones (110a, 110b) are separated by a distance less than a parameter d from the intersection IJ of the cones, the method further comprising extracting as valid objects representatives satisfying said validity analysis , storing their position in the global coordinate system, storing their predicted classification, and storing their dimension. [Claim 11] Geolocation method according to claim 10 for which the validity analysis includes a search (355) for intersection nodes and a calculation of proximity between the axes of the cones to determine whether intersections are sufficiently close, by means of of the generation of a K-D tree of the points I and J and the selection from said KD tree of the pairs of cones having a minimum distance (130) less than a parameter d at the intersection IJ of the cones, the value in meters of said parameter d being chosen as a function of the size of the classes of objects detected, and comprising (355), for the condition according to which the intersections are sufficiently close, an algorithm for identifying points I and J of closest distance from the vectors and

directors of

pairs of intersecting cones, an algorithm for constructing spheres of influence P, Q whose radius is the radius of the cone according to a perpendicular to the director axis of the cone around the points I and J called attachment points and an algorithm validation of the condition if the points I and J are mutually contained in the sphere of influence of the opposite cone, the axes of the cones then being contained in the opposite cone at the intersection. [Claim 12] Geolocation method according to claim 11 comprising, for all the pairs of cones whose intersections are sufficiently close and whose direction vectors are mutually contained in the cones opposite the intersection and when two or more attachment points have been found, an analysis for detecting parasitic intersections, said analysis comprising a step (360) of producing a graph (600): - for which each attachment point constitutes a node, - for which two nodes are connected if they satisfy the conditions of sufficiently close intersections and direction vectors mutually contained in the cones, said analysis further comprising for each connected component (A1, A2, A3) of the graph (600), a choice (365) of the node (Rp1, Rp2, Rp3) having the maximum degree of links as a representative of a single object and a sorting (370) by decreasing degrees of connectivity of said representatives then an iterative process of validating determination and subtraction of the highest order representative and of the cones converging towards the latter, of analysis of convergences of remaining cones and of deletion of the representatives found without converging cones (375), said iterative process being reiterated until that all representatives have been validated or invalidated.