WO2022152778A2 - Procédé d'alignement d'un modèle 3d d'un objet avec une image comprenant au moins une partie dudit objet pour obtenir une réalité augmentée - Google Patents

Procédé d'alignement d'un modèle 3d d'un objet avec une image comprenant au moins une partie dudit objet pour obtenir une réalité augmentée Download PDF

Info

Publication number
WO2022152778A2
WO2022152778A2 PCT/EP2022/050605 EP2022050605W WO2022152778A2 WO 2022152778 A2 WO2022152778 A2 WO 2022152778A2 EP 2022050605 W EP2022050605 W EP 2022050605W WO 2022152778 A2 WO2022152778 A2 WO 2022152778A2
Authority
WO
WIPO (PCT)
Prior art keywords
model
image
lines
camera
interest
Prior art date
Application number
PCT/EP2022/050605
Other languages
English (en)
Other versions
WO2022152778A3 (fr
Inventor
Javier Martinez Gonzalez
Antoine VILLERET
Original Assignee
Agc Glass Europe
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agc Glass Europe filed Critical Agc Glass Europe
Priority to EP22701552.6A priority Critical patent/EP4278327A2/fr
Publication of WO2022152778A2 publication Critical patent/WO2022152778A2/fr
Publication of WO2022152778A3 publication Critical patent/WO2022152778A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality

Definitions

  • the present invention relates to the field of augmented reality.
  • the invention relates to a method for aligning a 3D model of an object with an image comprising at least a portion of said object wherein said augmented reality is obtained by the superposition of the 3D model and the image captured by a camera.
  • the invention allows a faster and more accurate alignment of the 3D model on an image, even when a user is moving and the image stream is changing.
  • the invention finds applications in multiple domains.
  • the invention can be applied to the domain of engineering and construction where it can help a user to visualize an architecture project and plan for its construction during the preparation phase. It can also help a user during the building phase and afterwards for the verification and maintenance steps.
  • a virtual world is generated by synthetic video, audio and haptic stimulus.
  • Virtual worlds can be further categorized into immersive and non-immersive environments. Immersive environments submerge the user in an artificial world where vision, audition and touch are mainly computer-generated. For instance, immersion can be achieved using Virtual Reality (VR) head-mounted displays or Cave Automatic Virtual Environments (CAVE) systems. VR can be non-immersive as well. For instance, we come across 3D computer games and Computer-Aided Design (CAD) tools where virtual avatars and objects don’t necessarily provide a sense of complete immersion.
  • CAD Computer-Aided Design
  • AR can be defined as a method of combining real and virtual worlds to obtain an interface that enhances the visual scene normally seen by the user.
  • the overlaid virtual objects can be additive to the natural environment or destructive to the natural environment, meaning that they can mask at least part of it.
  • AR systems have many applications ranging from learning, team collaboration, and behavioral studies to industrial applications such as Maintenance Repair and Operations (MRO), crew training and robotics.
  • MRO Maintenance Repair and Operations
  • AR can help a user in:
  • a user can use AR to display the theoretical position of an equipment in the real world. The user can then compare this theoretical position with the real position were said equipment was installed. If the two positions do not match, the user will be able to tell with just a glance and save time by correcting the mistake rapidly.
  • AR can be used to show an operator how to perform a procedure, typically a maintenance or service operation.
  • Step-by-step the system displays the operations to perform directly onto the real equipment.
  • the user can also navigate from one step to another, possibly take pictures and annotate them to create a report of his intervention.
  • AR is available via software applications developed for deployment on a wide range of devices such as smartphones or tablets, but it is also more and more used on wearable devices such as glasses or any other optical see-through system.
  • the system uses cameras embedded in these devices to capture an image or a video of the real world.
  • the virtual object is generally created beforehand using computer-aided design (CAD).
  • CAD computer-aided design
  • This tool is commonly used to digitally create 2D drawings and 3D models of real -world objects before they are manufactured. With 3D CAD, it is possible to review, simulate, and modify designs easily.
  • a chosen 3D model is aligned onto the captured video.
  • Alignment is one of the major challenges of AR. It relates to the ability to display 3D augmentations with the exact location, orientation and scale onto the captured objects from the real -world. The quality of the alignment has a huge impact on the end-user experience.
  • the first requirement for a good alignment is the ability to recognize the target real -word object and its location relative to the user.
  • Markers are specific visual patterns visible in the real-word. These markers often include additional visually encoded information, which may be used to identify them in the images or video streams. These markers are used for calibration and provide good estimates for the location and orientation of the target object relative to the camera.
  • document WO2014/114118 describes a method of detection of two-dimensional markers such as QR codes through a camera video.
  • the two-dimensional markers contain at least position information on a corresponding virtual object.
  • the system can align the virtual object with reality using the position information.
  • this technique suffers from several drawbacks. First, it requires adding markers to the reality, which is sometimes not practical. Moreover, these markers can become dirty or be torn out, making them unusable. Finally, this technique can be quite inaccurate since it largely relies on the correct positioning of a small markers in the environment.
  • Pattern-based recognition refers to a technique that requires the target object to have a well recognizable texture, such as a specific pattern on a tissue. The recognition of the object is then based on finding this specific texture on the real- word object. In commercial applications, this approach is for instance used to augment product catalogues.
  • Another approach is to recognize an object using simple shapes. For instance, document US10482663 discloses an object-recognition engine which is configured to compare objects captured by the camera with a plurality of objects stored in a database. For each captured object, the engine tries to match the captured object with the shape of an object stored in the database. However, this shape-based approach becomes problematic when the object cannot be captured in its entirety.
  • Another approach overcoming this drawback is to recognize an object based on remarkable points.
  • the 3D model can be aligned over the captured image from the reality by superposing the remarkable points from the 3D model with the corresponding remarkable points in the captured image.
  • the alignment method consists in finding a corner in the 3D model and aligning it with a corresponding corner in the captured image of the real world. For instance, in order to find corners is the 3D model, it is possible to project the 3D model on a plane and use a comer detection algorithm.
  • An example of corner detection on a 2D projected image is disclosed on the scientific publication: J Alison Noble, Finding corners, Image and Vision Computing, Volume 6, Issue 2, 1988, Pages 121-128.
  • This method using a comer as a remarkable point is mainly employed because the detection of corners in the 3D model is simple.
  • the identification of the corresponding corner in the captured image of the real world is often a hard task for the user. Indeed, given the current devices capabilities, this whole process is carried out manually, thus requiring the user to manually adapt the position and the direction of the capture device in order to align the selected corner from the 3D model onto the corresponding corner in the captured image of the real world.
  • the whole alignment process may take around 30 seconds or more.
  • the technical problem to be solved is how to develop a faster alignment method in order to match the 3D model with reality, which does not require as many manipulations from the user as existing methods.
  • the invention proposes to use a line with a predetermined orientation as a remarkable indicator.
  • Matching lines is a very old technique that was abandoned because the current 3D models are getting more and more complex and contain too many different lines. As a consequence, it is very difficult for a user to select a specific line among all these lines. For instance, there are typically millions of lines in a 3D model of a building.
  • the invention relates to a method for aligning a 3D model of an object with an image comprising at least a portion of said object and captured by a camera in order to obtain an augmented reality, said method comprising the following steps:
  • a camera is a device capable of recording an image of an object.
  • a camera may include at least one sensor that detects the information used to make the image.
  • the sensor may be an intensity sensor or a depth sensor.
  • the camera may includes several sensors of the same or different way of recording an image (intensity and / or depth sensor).
  • the camera may be embedded into a device such as a tablet or a smartphone.
  • Such method may also comprises a step of selection of the 3D model among several 3D models and/or a subpart of the 3D model prior to the step of defining the capture position.
  • the 3D model might be a large project such as for instance a building that can be separated into smaller projects such as floors or rooms.
  • selecting a smaller project will reduce even more the number of selectable edges and thus reduce the time for a user to select an edge among all the selectable edges.
  • the step of defining a capture position is carried out on a 2D above representation of the object extracted from the 3D model.
  • This method relies on a simplified global representation of the object. It is indeed easier for a user to navigate in a 2D representation rather than in a 3D representation. The user will thus save time in selecting his capture position and reduce the overall process duration.
  • the user might be more comfortable in identifying his position on the map of a floor, rather than on a 3D model. Indeed, a user is more used to read maps and it is more difficult to navigate into a representation with three dimensions rather than into a representation with only two dimensions.
  • the method comprises a step of definition of an angular position of the image relative to the object using a rotation of the 2D above representation around the capture position; the step of displaying the 3D model being simulated from said capture position and said angular position.
  • the user can rotate the 2D above representation in order to match it with his viewpoint.
  • the user generally knows his position in an environment and he is capable of orientating the 3D model relative to his position in order to ease the match between the 3D model and the reality.
  • the definition of an angular position helps in the displaying process as the system does not need to test every angular position in order to find the best way to display the 3D model and which part of it to display. Thus, this additional step saves computing time.
  • the step of displaying the 3D model simulated from said capture position is realized by extracting the capture features of said camera and by displaying the 3D model with said capture features.
  • the capture features are the intrinsic and extrinsic parameters.
  • the intrinsic parameters are the parameters intrinsic to the camera itself, such as the focal length and lens distortion.
  • the extrinsic parameters are the parameters used to describe the transformation between the camera and its external world.
  • a camera may display the captured reality in a distorted way compared to our vision.
  • a 3D model displayed using the camera capture features will present the same distortions as the camera.
  • it will be easier for a user to compare the 3D model with the captured image and to identify objects with similarities if they are distorted the same way. Therefore, the user will be quicker to select a remarkable edge from the 3D model to match with a line from the reality.
  • said 3D model is superposed to the image captured by the camera.
  • This feature allows the user to check if the alignment process is carrying on smoothly. Indeed, if the user sees that the displayed 3D model does not match at all with the captured reality, he might react more quickly using this feature because he would be able to see the differences in a single glance. In the contrary, if the 3D model was not displayed on the captured reality, the user might forget more easily what the 3D model looks like and try to match two complete different objects.
  • said 3D model is built using a polygon mesh where each polygon has at least three sides; the step of identifying remarkable edges in the 3D model being carried out by the following sub-steps: a. detecting the edges of the 3D model formed by the alignment on a same line of at least two sides of two consecutive polygons; b. selecting the edges where the line is orientated along said reference orientation; and c. selecting the edges with a length greater than a threshold value.
  • Mesh modeling is a type of modeling used to builds 3D objects out of smaller components such as polygons.
  • Each polygon is a completely flat shape that is defined by the position of its points and connecting edges. Very complicated models of any shape can be built completely out of polygons. The precision and fidelity of the model can be tuned by increasing or decreasing the number of polygons in the model. Indeed, polygons are flexible and can be rendered quickly by computers.
  • the reference orientation of the edges and lines used as a remarkable indicator is vertical. Indeed, vertical lines are more outstanding and reliable than horizontal lines in the natural environment.
  • said method comprises a step of calibration of the ground position in the displayed image after the step of displaying the image.
  • the user is asked to select the ground in the real image captured by the camera. This step helps in positioning the 3D model height relative to the captured image.
  • the step of identifying lines of interest in the image captured by the camera being carried out by the following sub-steps: a. capturing at least an image of the object; b. identifying lines of interest in the captured image using a line detection algorithm; c. determining the orientation of the identified lines of interest using an orientation device; and d. filtering the identified lines of interest with the same orientation as said reference orientation.
  • the step of identifying lines of interest in the captured image is realized using an algorithm chosen among: the Line Segment Detector, a contour detection, Canny, Sobel, plane intersections or Hough transform.
  • these algorithms are particularly efficient in order to detect edges in an image.
  • the orientation device used to determine the orientation of the identified lines of interest corresponds to an accelerometer, a gyroscope and/or a magnetometer.
  • the accelerometer can give information on acceleration forces. Such forces may be static, like the continuous force of gravity.
  • a dynamic accelerometer can thus measure gravitational pull to determine the angle at which an object is tilted with respect to the Earth. This feature is used to determine the orientation of the identified lines of interest in the captured image.
  • the gyroscope uses Earth's gravity to help determine orientation.
  • the magnetometer measures the direction, strength, or relative change of a magnetic field.
  • Vector magnetometers in particular have the capability to measure the component of the magnetic field in a specific direction, relative to the spatial orientation of the camera.
  • the method according to the invention may be carried out differently.
  • said camera includes at least one intensity sensor.
  • a RGB sensor is an intensity sensor. It contains an array of photodiodes covered by red, blue and green filters. The photodiodes receive the light coming from the object and convert it into a current in order to estimate the intensity of each color. The resulting image does not contain information on the orientation of lines in the 3D camera space, which is why this embodiment requires using an orientation device to select lines with a specific orientation.
  • said camera includes at least one depth sensor, the step of determination of the orientation of the identified lines of interest being realized using a reprojection of the image captured by the depth sensor in the 3D coordinate system of the 3D model using the camera capture features and said orientation device.
  • a depth sensor is a sensor capable of measuring distances between the sensor and a particular point of an object. It can generally be achieved by measuring the distortion and duration of a returning electromagnetic wave compared to the emitted one. This information can therefore be used to retrieve a 3D representation of the object.
  • a lidar sensor is a depth sensor capable of measuring distances by illuminating a target object with a laser light and measuring the reflection. Difference in laser travel time and wavelength can be used to retrieve a 3D representation of the object.
  • the resulting image already contains information on the orientation of lines in the 3D camera space. Therefore, this type of sensor allows using the depth information contained in the image in order to create a 3D representation of the image and retrieve the orientation of a line in the 3D camera space.
  • the orientation device is required during this step in order to determine the orientation of the depth sensor with respect to the reality.
  • said camera contains both at least one depth sensor and at least one intensity sensor, the step of identifying lines of interest being carried out by merging the filtered lines obtained from the at least one depth sensor and the filtered lines obtained by the at least one intensity sensor before the step of selecting the line.
  • Figure 1 is a block diagram of the method for aligning a 3D model of an object with an image captured by a camera according to an embodiment of the invention
  • Figure 2 is a representation of the step of selection of the 3D model among several 3D models and/or a subpart of the 3D model of the method from figure 1;
  • Figure 3a is a representation of the steps of definition of a capture position of the method from figure 1;
  • Figure 3b is a representation of the steps of definition of an angular position of the method from figure 1;
  • Figure 4 is a representation of the step of displaying the 3D model of the method from figure 1;
  • Figure 5 is a representation of a mesh of a 3D model according to an embodiment
  • Figure 6 is a representation of the step of selection of a remarkable edge in the 3D model of the method from figure 1 ;
  • Figure 7 is a representation of the step of selection of a line of interest in the image from reality of the method from figure 1 ;
  • Figure 8 is a representation of the step of calibration of the ground position in the image from reality of the method from figure 1 ;
  • Figure 9 is a block diagram of the line detection algorithm depending on the nature of the image sensor.
  • Figure 10 is a representation of different position of the camera compared to the object and the resulting captured images
  • Figure 11 is a representation of the triangulation process using the camera position and captured images from figure 10;
  • Figure 12 is a representation of the global process of 3D alignment in the case of a building project. Detailed description
  • Figure 1 is a block diagram presenting the main steps of the method 10 for aligning a 3D model 40 of an object with an image 50 of the real word captured by a camera, according to an embodiment of the invention.
  • the method 10 may be hosted by an application that the user can install on his phone, tablet or any dedicated device.
  • the device shall be equipped with at least a camera in order to capture the environment and render an augmented version of it.
  • the environment shall contain at least part of an object that is to be augmented.
  • the method 10 according to the invention will then be able to recognize a particular feature of this object in the 3D model 40 and in the reality and match the two in order to create an augmented reality.
  • the method 10 includes a first step 11 of selection of a project 21.
  • the project 21 contains at least a 3D model 40 of an object generated using a dedicated software.
  • the 3D model 40 is a representation of the object using a collection of points positioned in a 3D space, also called a point cloud.
  • the points may be connected by various geometric entities such as polygons, lines or curved surfaces.
  • the 3D model 40 is built using a mesh 60 of triangles A-R such as illustrated in figure 5.
  • BIM Building Information Modeling
  • a BIM is supported by various tools involving the generation and management of digital representations of physical and functional characteristics of the building project 21.
  • the BIM also includes one or several 3D models 40 of the building.
  • the project 21 may be stored on the memory of the user’s device or on the cloud and be accessed using said device or with any type of device including an internet connection and a web browser. This solution allows to access the BIM at any time and to synchronize in real-time with the BIM project as soon as an update is made.
  • a building project 21 may be separated into subprojects 22, each subproject 22 containing a 3D model 40 of a floor of the building.
  • the application can display a list of the subprojects 22 and the user may therefore select the subproject 22 he needs to display.
  • the method 10 according to the invention also includes a second step 12 of definition of a capture position 31 of the image 50 relative to the object.
  • the user is usually holding the device containing a camera that is capturing an image 50 of the environment, the user is asked to define his position relative to the object that needs to be augmented. For instance, in the case of a building project 21, the user is asked to define his position in the building.
  • the application may display the selected subproject 22 in three dimensions.
  • the user can then navigate into the 3D model 40 using a control pad.
  • the application may also display a 2D above representation of the object. For instance, as illustrated on figures 3a and 3b, in the case of a building project 21, a map 30 of the floor can be displayed to the user. The user can therefore select on the map 30 his position 31.
  • the method 10 may also ask the user to define 13 his angular position al- a2 relative to the displayed object.
  • the user is asked to define how the object is oriented relative to his position 31, so that the selection of the resulting edge can be achieved via the device in 3D.
  • the user may define how the 3D model of the floor is positioned relative to the direction he is facing.
  • this angular position al- a2 the user may rotate the 2D above representation around his capture position 31 as shown in figures 3a and 3b.
  • the method 10 may also extract 14 the capture features of the camera by which the scene is captured. Indeed, depending on the type of the camera, the displayed image 50 of the environment can change a lot. For instance, a wide-angle video camera captures images with a fisheye effect, meaning that the image 50 is distorted and all the objects are drawn in perspective from the center.
  • the capture features may include intrinsic parameters of the camera, such as the focal length and lens distortion and extrinsic parameters of the camera, used to describe the transformation between the camera and the real world.
  • the model extracts 14 the capture features of the camera in order to display 15 the 3D model 40 with said capture features.
  • the fifth step of the method 10 is to display 15 to the user the 3D model 40 using the defined position of capture 31, the defined angular position al- a2 and the capture features of the camera.
  • a user For instance, if a user is localized in a room of the building, facing a wall of the room where a window needs to be installed, the user will see, displayed on the screen of the device, a 3D model 40 of this wall containing said window to install.
  • the 3D model 40 is superposed to the image 50 captured by the camera in order for the user to be able to compare the reality and the 3D model 40 on the screen of his device without having to check with his eyes what the reality looks like. Indeed, as mentioned previously, his vision does not have the same capture features as the camera and the wall that he sees in the reality might not look the same through the camera lens.
  • a remarkable edge 41 is for instance aline separating two adjacent walls or a wall and the ceiling.
  • the edges 41 are identified using the mesh 60 of the 3D model 40.
  • the edges 41 are identified by detecting the alignment of consecutive triangles A-R on a same line.
  • triangle K and O are adjacent triangles, meaning that they share a side.
  • the straight line passing by this edge also passes by the side shared by triangles L and M.
  • a line passing by two consecutive sides of adjacent triangles can be identified as an edge 41.
  • this edge might not be the longest edge there is.
  • the edge 41 goes from triangle O to triangle R.
  • the identified edges 41 can be filtered depending on their size. Only the edges 41 with a length greater than a threshold value may be displayed to the user.
  • a further step is to select the edges 41 orientated along a same reference orientation.
  • the reference orientation can either be predetermined or be chosen by the user depending on the nature of the 3D model 40. For instance, only horizontal lines can be chosen or vertical lines, or lines with a predetermined angle or a range of predetermined angles with the vertical, for instance the lines between 40 and 60° angle with the vertical.
  • the resulting identified edges 41 are displayed on the 3D model 40.
  • the user is then invited, by a message 42 appearing on the screen for instance, as shown in figure 6, to select 17 an edge on the 3D model 40. If several edges are present in the selected area, the closest edge to the selected zone is selected.
  • the selected edge 43 may be highlighted compared to the other edges in order for the user to check if the right edge was selected.
  • the next step 18 of the method 10 consists in displaying the camera feed.
  • This feed can either be a static image or a video.
  • the user may be asked to select the ground position 46 in the image 50 by a message 47 appearing on the screen for instance, as shown in figure 8.
  • This step is used to calibrate 19 the ground position 46 in order to ease the process of alignment of the 3D model 40 on the captured image 50.
  • the ground position 46 will help in positioning the height of the 3D model 40.
  • the calibration step 19 will help in positioning the walls of the room relative to the ground so that they do not appear to float.
  • the image 50 is processed in real time in order to identify 20 lines of interest 44.
  • the image processing may be carried out differently. Indeed, many types of sensors can be used to obtain an image of an object. For instance, it is possible to use RGB sensors, infrared sensors, visible light sensors, hyperspectral sensors, multi-lens sensors, structured light sensors, lidar sensors...
  • a lidar sensor is a depth sensor
  • intensity sensor a second type of sensor, called intensity sensor, containing no information on the position of the lines of interest 44 in the 3D camera space.
  • a RGB sensor is an intensity sensor.
  • Figure 9 is a block diagram of the image processing method depending on the nature of the sensor.
  • the left part of the block diagram is dedicated to the treatment of an image captured by at least one intensity sensor.
  • the right part of the block diagram is dedicated to the treatment of a map captured by a depth sensor.
  • an intensity sensor is either a RGB Ultra-Wide sensor with the following features: 1920 x 1440 pixels, 24 bits, 60 fps, a fixed focus of 13mm or/and a RGB Wide sensor with 1920 x 1080 pixels, 24 bits, 60 fps, an auto focus of 29mm.
  • the depth sensor may be a Lidar sensor with 256 x 192 pixels, 32 bits, 60 fps, and a fixed focus. More generally, the invention may be applied to any type of sensor embedded in a mobile device.
  • the sensor may include only an intensity sensor.
  • the first step of identification 20 of lines of interest 44 is to take 51 an intensity image 50 of the object.
  • An intensity image is a 2D array of pixels, each pixel being characterized by its (x,y) coordinates and its intensity value.
  • the lines of interest 44 are characterized by a sharp changes in the image brightness.
  • the image 50 is processed 52 using edges detection algorithms such as the Line Segment Detector algorithm, a contour detection algorithm, the Canny algorithm, the Sobel algorithm or a plane intersections algorithm. These algorithms detect variations in the pixels values by using derivatives, thresholds and/or various other methods.
  • the lines of interest 44 identified in the intensity image are a projection from the 3D space on a 2D image. Thus, it is impossible to know their position in the 3D space without further information.
  • the orientation device contained in the camera is used.
  • This orientation device may either be an accelerometer, a gyroscope and/or a magnetometer.
  • an accelerometer can measure gravitational pull to determine the angle at which an object is tilted with respect to the Earth. This feature is used to determine the lines of interest 44 that are collinear to the selected edge 43 in the captured image 50.
  • a gravity vector obtained thanks to the orientation device, is first projected from the optical axis at infinity.
  • the gravity vector is placed at infinity along the optical axis, then, said gravity vector is scaled to be infinite in size. Afterwards, the gravity vector is projected back onto the sensor. The point where the projected gravity vector encounters the sensor is called central horizon point.
  • the identified lines of interest 44 are compared to this central horizon point.
  • the lines of interest 44 which are often segments of lines are extended in order to see if they cross with the projected gravity vector at substantially the central horizon point position, meaning within 5 pixels and preferably within 2 pixels. If this is the case, the lines of interest 44 are vertical.
  • the senor may include only a depth sensor such as a lidar sensor.
  • the first step of identification 20 of the lines of interest 44 is to capture 61 an image of the object.
  • a lidar sensor it is possible to capture a depth map and/or a confidence map of the object.
  • the depth map is a 2D array of pixels, each pixel being characterized by its (x,y) coordinates and its value, which is the average distance traveled by a light beam in order to be reflected by a specific area of the object.
  • a confidence map is a 2D array of pixels, each pixel being characterized by its (x,y) coordinates and its confidence value. It helps in estimating the signal quality. Indeed, depending on the nature of the object and the material it is made of, the light used to measure the object depth can be reflected differently and even be deviated. Thus, the confidence map measures the intensity of the signal until saturation of the sensor for each point of the object.
  • the depth image is processed 62, using a line detection algorithm such as for instance the Hough transform algorithm, in order to detect the lines of interest 44 by finding huge variations of depth from one pixel to another.
  • a line detection algorithm such as for instance the Hough transform algorithm
  • the depth image is reprojected 63 in three dimensions using the information contained in the depth image.
  • the orientation device is used to determine how the camera is tilted with respect to the real world.
  • This reprojection 63 is preferably carried out using the camera parameters in order to adapt the 3D representation to the camera features and deformations.
  • the identified lines 44 are then filtered 64 in order to select only the lines 44 with the same orientation as the selected edge 43, knowing their orientation in the 3D space thanks to reprojection step 63.
  • the resulting identified lines 44 are displayed 20 on the captured image 50.
  • the user is then invited, by a message 45 appearing on the screen for instance, as shown in figure 7, to select 21 a line on the image 50 from reality.
  • the edge can be selected by the user by moving the camera until the selected edge is aligned on the real edge. If several lines 44 are present in the selected area, the closest line 44 to the selected zone is selected.
  • the selected line 48 may be highlighted compared to the other lines 44 in order for the user to check if the right line was selected.
  • the 3D model 40 is preferably hidden during this step.
  • the process includes additional steps. Indeed, the selected line 48 is tracked 54 to compute its projection from a 2D image 50 space to a 3D model 40 space. In fact, several lines from the 3D real world may have the same projection onto a certain plane. In other words, with an image taken from a single point of view, it is impossible to determine the 3D orientation of this selected line 48.
  • the projection can either be computed via automatic plane detection of the ground, ceiling or manual operation by the user.
  • figure 10 shows an example of three images k-1, k, k+1 taken during the user’s movement around the object. Each captured image k-1, k, k+1 shows the object from a different point of view.
  • the first information required is to determine the position of the camera relative to the line 48 for every captured image k-1, k, k+1.
  • the camera position can be determined using the internal position tracker of the device.
  • the camera may include two RGB sensors separated from a fixed known distance d. It is therefore possible to use the two images captured by the two RGB sensors in order to triangulate the object position without requiring the user to move around.
  • the camera may include more than two RGB sensor and/or other types of sensor such as hyperspectral sensors, multi lens sensors or structured light sensors in order to get a better triangulation of the object position.
  • the process goes on to the next step, which is the alignment 22 of the 3D model 40 with the reality 50.
  • the camera includes at least both an intensity sensor and a depth sensor. This embodiment may allow to compare the lines identified on both captured images and to confirm or deny the existence of an identified line 44.
  • a depth sensor has a smaller resolution than an intensity sensor.
  • a depth sensor has a resolution that is a hundred time smaller than an intensity sensor.
  • the adaptation may take into account the parameters of both sensors and the distance between the two sensors.
  • the pixels from the intensity captured image include depth information coming from the depth sensor, and thus information on the orientation of the selected lines 44.
  • This embodiment may therefore allow to omit the steps of tracking 54 and triangulating 55 the object on the captured image from the intensity sensor.
  • the alignment step 22 consists in aligning the selected edge 43 from the 3D model 40 with the selected line 48 from the captured image 50 via rigid transformation, meaning that the line is only rotated and translated and not rescaled.
  • the obtained transformation is also applied to the rest of the 3D model 40.
  • the 3D model can stay aligned with the camera image feed using a detection of the movement of the object and/or a detection of the movement of the capture device.
  • the invention is an alignment method for matching a 3D model with reality, that does not require too many manipulations from the user and that is faster than existing methods.
  • the invention improves the user-end experience during the first step of alignment of an AR system but also during the step of realignment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne un procédé d'alignement d'un modèle 3D d'un objet avec une image comprenant au moins une partie dudit objet, ledit procédé comprenant les étapes suivantes consistant à : définir (12) une position de capture de l'image ; afficher (15) le modèle 3D simulé à partir de ladite position de capture ; identifier (16) des bords remarquables dans le modèle 3D, lesdits bords remarquables présentant la même orientation de référence ; sélectionner (17) un bord parmi les bords remarquables identifiés à partir du modèle 3D affiché ; afficher (18) l'image capturée par la caméra ; identifier (20) les lignes d'intérêt dans l'image capturée par la caméra, lesdites lignes d'intérêt présentant une orientation identique à ladite orientation de référence ; sélectionner (21) une ligne, parmi lesdites lignes d'intérêt, correspondant au bord sélectionné ; et aligner (22) et/ou transformer le modèle 3D de sorte que le bord sélectionné vient en correspondance avec la ligne sélectionnée.
PCT/EP2022/050605 2021-01-15 2022-01-13 Procédé d'alignement d'un modèle 3d d'un objet avec une image comprenant au moins une partie dudit objet pour obtenir une réalité augmentée WO2022152778A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22701552.6A EP4278327A2 (fr) 2021-01-15 2022-01-13 Procédé d'alignement d'un modèle 3d d'un objet avec une image comprenant au moins une partie dudit objet pour obtenir une réalité augmentée

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP21151907.9 2021-01-15
EP21151907 2021-01-15

Publications (2)

Publication Number Publication Date
WO2022152778A2 true WO2022152778A2 (fr) 2022-07-21
WO2022152778A3 WO2022152778A3 (fr) 2022-08-25

Family

ID=74187125

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/050605 WO2022152778A2 (fr) 2021-01-15 2022-01-13 Procédé d'alignement d'un modèle 3d d'un objet avec une image comprenant au moins une partie dudit objet pour obtenir une réalité augmentée

Country Status (2)

Country Link
EP (1) EP4278327A2 (fr)
WO (1) WO2022152778A2 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014114118A1 (fr) 2013-01-28 2014-07-31 Tencent Technology (Shenzhen) Company Limited Procédé et dispositif de mise en œuvre destinés à une réalité augmentée pour code bidimensionnel
US10482663B2 (en) 2016-03-29 2019-11-19 Microsoft Technology Licensing, Llc Virtual cues for augmented-reality pose alignment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10026209B1 (en) * 2017-12-21 2018-07-17 Capital One Services, Llc Ground plane detection for placement of augmented reality objects

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014114118A1 (fr) 2013-01-28 2014-07-31 Tencent Technology (Shenzhen) Company Limited Procédé et dispositif de mise en œuvre destinés à une réalité augmentée pour code bidimensionnel
US10482663B2 (en) 2016-03-29 2019-11-19 Microsoft Technology Licensing, Llc Virtual cues for augmented-reality pose alignment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
J ALISON NOBLE, FINDING CORNERS, IMAGE AND VISION COMPUTING, vol. 6, 1988, pages 121 - 128
JIANBO SHITOMASI: "Good features to track", PROCEEDINGS OF IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 1994

Also Published As

Publication number Publication date
WO2022152778A3 (fr) 2022-08-25
EP4278327A2 (fr) 2023-11-22

Similar Documents

Publication Publication Date Title
US10977818B2 (en) Machine learning based model localization system
CN108898676B (zh) 一种虚实物体之间碰撞及遮挡检测方法及系统
Aladren et al. Navigation assistance for the visually impaired using RGB-D sensor with range expansion
JP5905540B2 (ja) 画像の少なくとも1つの特徴として記述子を提供する方法及び特徴をマッチングする方法
US7519218B2 (en) Marker detection method and apparatus, and position and orientation estimation method
JP6168833B2 (ja) 3DGeoArcを用いた多モードデータの画像登録
US10846844B1 (en) Collaborative disparity decomposition
WO2016029939A1 (fr) Procédé et système pour déterminer au moins une caractéristique d'image dans au moins une image
US20170214899A1 (en) Method and system for presenting at least part of an image of a real object in a view of a real environment, and method and system for selecting a subset of a plurality of images
US20110292036A1 (en) Depth sensor with application interface
US20100328308A1 (en) Three Dimensional Mesh Modeling
US9182220B2 (en) Image photographing device and method for three-dimensional measurement
US9940716B2 (en) Method for processing local information
CN107025663A (zh) 视觉系统中用于3d点云匹配的杂波评分系统及方法
EP3629302B1 (fr) Appareil de traitement d'informations, procédé de traitement d'informations et support d'informations
KR20180123302A (ko) 볼의 궤적을 시각화하는 방법 및 장치
EP2779102A1 (fr) Procédé de génération d'une séquence vidéo animée
Kochi et al. 3D modeling of architecture by edge-matching and integrating the point clouds of laser scanner and those of digital camera
JP6262610B2 (ja) 情報登録装置及び情報継続登録装置並びに方法及びプログラム
CN114155233A (zh) 获得表示图像的锐度级别的配准误差图的装置和方法
WO2022152778A2 (fr) Procédé d'alignement d'un modèle 3d d'un objet avec une image comprenant au moins une partie dudit objet pour obtenir une réalité augmentée
US10339702B2 (en) Method for improving occluded edge quality in augmented reality based on depth camera
Lee et al. Tracking with omni-directional vision for outdoor AR systems
KR20170108552A (ko) 수변구조물 피해정보 분석을 위한 정보시스템 구축방안
Radanovic et al. Virtual Element Retrieval in Mixed Reality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22701552

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022701552

Country of ref document: EP

Effective date: 20230816

WWE Wipo information: entry into national phase

Ref document number: 523441614

Country of ref document: SA