EP4107699A1 - A method for generating a dataset, a method for generating a neural network, and a method for constructing a model of a scene - Google Patents
A method for generating a dataset, a method for generating a neural network, and a method for constructing a model of a sceneInfo
- Publication number
- EP4107699A1 EP4107699A1 EP21710811.7A EP21710811A EP4107699A1 EP 4107699 A1 EP4107699 A1 EP 4107699A1 EP 21710811 A EP21710811 A EP 21710811A EP 4107699 A1 EP4107699 A1 EP 4107699A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- image
- images
- scene
- focal length
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 139
- 238000000034 method Methods 0.000 title claims abstract description 121
- 238000012549 training Methods 0.000 claims abstract description 119
- 230000001131 transforming effect Effects 0.000 claims abstract description 23
- 238000013459 approach Methods 0.000 claims abstract description 14
- 230000008859 change Effects 0.000 claims abstract description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 40
- 238000012545 processing Methods 0.000 claims description 7
- 238000001514 detection method Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 230000008901 benefit Effects 0.000 description 11
- 230000003287 optical effect Effects 0.000 description 7
- 230000009466 transformation Effects 0.000 description 7
- 230000009467 reduction Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/579—Depth or shape recovery from multiple images from motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
Definitions
- the present invention relates, in general, to a method for generating a dataset for training an image depth estimation neural network, a method for generating an image depth estimation neural network, and a method for constructing a three- dimensional model of a scene.
- the 3D model may be constructed from images that image the scene from different viewpoints, using a structure from motion algorithm.
- the structure from motion algorithm may find correspondences between images, i.e. find points that occur in several images, and analyze these correspondences to form the 3D model.
- a 3D reconstruction algorithm that uses depth information from a depth sensor to improve the accuracy of the 3D model is described in [3D Mapping with an RGB-D Camera,
- a camera does not comprise a depth sensor, it may be possible to extract depth information of the points in the images of the scene using a neural network.
- the neural network may e.g. be trained using the Camera-Aware Multi-scale Convolutions (CAM-Convs) method, as described in [Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11826-11835]
- the CAM-Convs method may concatenate camera internal parameters to the feature maps, and hence allow the network to learn the dependence of the depth from these parameters.
- the invention stems from the realization that not all cameras comprise a depth sensor. Thus, there is room for improvement of the accuracy of 3D models of sceneries when depth sensor data is unavailable or corrupt.
- depth information may be estimated from the images themselves, even from single images, using a neural network that is suitably trained. We herein call such a neural network an image depth estimation neural network.
- a neural network may be generated using training data derived from a suitable dataset. According to the inventive concept a method for generating a suitable dataset and a method for generating a suitably trained image depth estimation neural network are provided together with a method for generating a 3D model.
- Each of these aspects of the invention may provide better 3D models of scenes. This in turn may e.g. result in autonomous vehicles navigating in a safer and more accurate manner.
- a method for generating a dataset for training an image depth estimation neural network comprising: receiving an image series comprising a plurality of images of a scene, each image being an image acquired by a camera, the camera having a position at the time of the acquisition of the image; receiving, for each image of the image series, a measured camera position, the measured camera position being the position of the camera measured at the acquisition of the image; forming a three- dimensional (3D) reconstruction of the scene, the 3D reconstruction comprising coordinates of a 3D model of the scene and a reconstructed camera position for each image, the reconstructed camera position being an estimate of the position of the camera relative to the 3D model at the acquisition of the image, wherein the 3D reconstruction of the scene is formed by running a structure from motion algorithm on the image series, wherein the structure from motion algorithm is configured to align the 3D reconstruction to the measured camera positions; calculating at least one depth measure of at least one image of the image series based on
- An advantage of the dataset is that it may provide accurate depth measures of the images in the dataset.
- the scale of the 3D model may become accurate.
- the 3D model may have a scale where one meter in the model corresponds to one meter in the real-world scene.
- the dataset may enable accurate future 3D modelling of a scene that is not represented by the dataset.
- position measurements may not be available and/or only few images may be available.
- the ability to accurately interpret depth from images the ability acquired by training of the image depth estimation neural network on the dataset which was generated from images where measured camera positions were available, may improve the accuracy of the future 3D model.
- the dataset comprising pairs of image data and depth data may be used in the training of an image depth estimation neural network.
- Image data from the dataset may be used as input data in the training while depth data from the dataset may be used as output data in the training.
- the image depth estimation neural network may learn to accurately estimate depth from images.
- Said image depth estimation neural network may, at a later time, be used in 3D modelling of a scene by providing estimates of image depth, e.g. from single images, which improve the accuracy of the 3D model in a similar fashion to how RGB-D camera depth measures improve the accuracy.
- the requirements on the cameras contributing to the images of the dataset may be low.
- the cameras may e.g. not comprise a depth sensor.
- the cameras may e.g. be smartphone cameras, dashboard cameras, action cameras etc.
- the cameras may also be a variety of the previously mentioned, and/or other, camera types.
- the cameras may e.g. comprise an image sensor for acquiring the image and a global position system (GPS) sensor for providing the measured camera position. Greater requirements may not be needed. This may contribute to a low cost of creating the dataset since images taken for other purposes, not necessarily with the sole purpose of generating the dataset, may be used.
- GPS global position system
- the dataset may enable cost-effective 3D modelling of a scene.
- the cost of future 3D modelling may be low if the image depth estimation neural network is trained using a low-cost dataset.
- the cost of future 3D modelling may be low if it can be done using a variety of cameras. This may be facilitated by the use of training data derived from a dataset according to the first aspect.
- a dataset that may be generated from images from a variety of cameras may improve the ability to handle a variety of cameras in the future 3D modelling.
- the image series may comprise images wherein each image overlaps at least partially with another image of the image series.
- the image series may depict a scene from a plurality of viewpoints or a plurality of angles.
- the scene may herein comprise one or more objects.
- the image series may e.g. be a series of street view images acquired from a vehicle as the vehicle moves along the street.
- the measured camera position may be a position measured with a positioning system, e.g. a GPS positioning system, a wifi positioning system, a cell ID positioning system, or a motion capture system.
- a positioning system e.g. a GPS positioning system, a wifi positioning system, a cell ID positioning system, or a motion capture system.
- the measured position may be a position measured with a combination of positioning systems.
- the positioning system may be associated with the camera that acquired the image, e.g. integrated in the camera or in communication with the camera.
- the estimate of the position of the camera relative to the 3D model may at first be an estimate based on the image series without accounting for the measured camera position.
- the structure from motion algorithm may firstly form the 3D model and the estimate of the positions of the cameras relative to the 3D model based on the image series.
- the structure from motion algorithm may secondly align the 3D reconstruction to the measured positions by adjusting the coordinates of the 3D model and the coordinates of the estimated positions of the cameras.
- a scale may be imposed on the model.
- the 3D reconstruction may, in addition to coordinates of a 3D model of the scene and a reconstructed camera position for each image, also comprise a reconstructed camera orientation for each image.
- the reconstructed camera position and the reconstructed camera orientation of an image may form a reconstructed camera pose for the image.
- the method may include in the dataset only images that fulfill certain requirements.
- Such requirements may e.g. be: only including images where an estimated focal length of the camera is available, or only including images where the 3D reconstruction is made using at least 2 neighboring images taken within a radius of 10 meters.
- calculating a depth measure of an image may be done by a multi-view stereo algorithm, e.g. by a patch based multi-view stereo algorithm.
- a distance between a viewpoint from which the image was taken and a point in the scene may be the distance between the optical center of the camera and the point in the scene.
- the optical center may e.g. be the aperture or lens of the camera.
- a representation of said distance may e.g. be said distance itself, i.e. what is sometimes simply called distance.
- a representation of said distance may be the shortest distance between the point in the scene and a camera plane, wherein the camera plane is orthogonal to the optical axis of the camera and comprises the optical center of the camera, i.e. what is sometimes simply called depth.
- the optical axis may herein be the viewing direction of the camera.
- the viewpoint may be essentially the same as the position of the camera at the time of the acquisition of the image.
- At least one depth measure of an image may be an entry in a depth map of the image.
- the depth map may comprise an array of depth measures wherein each depth measure corresponds to the depth measure of a certain pixel, or group of pixels, in the image.
- entries of a depth map may be undefined, e.g. marked by “not a number”. It may not be possible to calculate depth measures for all pixels in an image.
- depth measures may be validated before entered in the dataset. A depth measure of a point in a scene in an image may only be entered into the dataset if it agrees with another depth measure of the same point in the scene in another image.
- the dataset may comprise pairs of image data and depth data in the form of one image and one depth map of the image in each pair.
- the structure from motion algorithm may be configured to align the 3D reconstruction to the measured camera positions through an adjustment of the coordinates of the 3D reconstruction, wherein the adjustment of coordinates penalizes reconstructed camera positions that deviates from the corresponding measured camera positions.
- the adjustment of the coordinates of the 3D reconstruction may be bundle adjustment. Adjustment of the coordinates of the 3D reconstruction, such as bundle adjustment, may be an effective way to obtain an accurate 3D reconstruction.
- a rough 3D reconstruction may first be formed, after which the coordinates may be adjusted. This may be more effective than aligning the 3D reconstruction to the measured positions already in an initial phase of the formation of the 3D reconstruction.
- the term adjustment may be construed as an adjustment of an initial 3D reconstruction, wherein the initial 3D reconstruction didn’t account for the measured camera positions.
- the adjustment of coordinates may penalize reconstructed camera positions that deviates from the corresponding measured camera positions by imposing a cost for each reconstructed camera position that depends on a distance between said reconstructed camera position and the corresponding measured camera position.
- the cost may be proportional to a function of the distance between the reconstructed camera position and the corresponding measured camera position, e.g. the euclidean distance, the squared distance, or robust functions such as the Cauchy loss.
- a method for generating an image depth estimation neural network the image depth estimation neural network being a neural network that estimates at least one depth measure of an image of a scene, wherein a depth measure of the image is a representation of a distance between a viewpoint from which the image was taken and a point in the scene of the image, the camera plane being a plane comprising the camera, the method comprising: receiving a first set of images, the first set of images being a plurality of images of a scene taken by one or more cameras; receiving, for each image in the first set of images, an associated focal length, the associated focal length being an estimate of a focal length of the camera that took the image; transforming the first set of images into a set of normalized training images, the set of normalized training images representing how images of the first set of images would appear if the images of the set had an joint focal length, wherein transforming an image of the first set of images into a normalized training image comprises rescaling the image
- a distance between a viewpoint from which the image was taken and a point in the scene may be the distance between the optical center of the camera and the point in the scene.
- a representation of said distance may e.g. be said distance itself or e.g. the shortest distance between the point in the scene and the camera plane.
- the method according to the second aspect may enable training of the image depth estimation neural network using images from a variety of cameras with different focal length as the images are transformed into normalized training images before training.
- different focal lengths of the cameras used for the training data could have degraded the performance of the neural network.
- a camera with a large focal length may provide a zoomed-in version of an image with a small focal length.
- Training the neural network without accounting for the different focal lengths may therefore result in a neural network that does not provide reliable depth estimations.
- a hypothesis is that without normalization of the focal length the neural network must accurately predict the focal length itself, a task that may be difficult. Put differently, by providing normalized data, less data is needed for training the neural network adequately.
- An advantage of the method according to the second aspect is that it may generate an accurate image depth estimation neural network.
- the image depth estimation neural network may be trained using normalized training images that are derived from images from a variety of cameras, the training dataset may be large.
- the training dataset may be much larger than if only cameras with depth sensors, or only cameras with a specific focal length, could be used.
- a large training dataset may improve the accuracy.
- the image depth estimation neural network may become more adapted to handle images from any kind of camera when the network is deployed in 3D modelling of a scene in a situation after training. For example, image distortion may be linked to the focal length.
- the method according to the second aspect may have several advantages compared to the CAM-Convs method.
- the method according to the second aspect may not rely on concatenating camera internal parameters to feature maps.
- the method according to the second aspect may not rely on informing the neural network about e.g. the viewing angles of the pixels in the images. Instead, these angles may be intrinsically learned by the neural network.
- the rescaling of the images may ensure that every pixel in the normalized training images always corresponds to the same viewing angle during training.
- the method according to the second aspect may be implemented on a large range of neural network architectures. It may even be implemented on any type of neural network architecture. In contrast, methods relying on concatenated parameters may need to be implemented on special types of architectures, such as u-net architectures.
- the method according to the second aspect may be a computationally less demanding way of generating an image depth estimation neural network than the CAM-Convs method, tests have indicated that the generated image depth estimation neural network have at least a comparable performance, regardless of the method. Under some deployment conditions a neural network generated according to the method of the second aspect outperforms neural networks generated according to the CAM-Convs method.
- the method according to the second aspect may be configured to generate an encoder-decoder neural network.
- the encoder-decoder neural network may have an architecture with skip connections. However, as mentioned other architectures are also possible.
- the method according to the second aspect may be configured to produce a feature map for each normalized training image, wherein the feature map is smaller than the normalized training image, e.g. 16 times smaller.
- an associated focal length i.e. an estimate of a focal length of the camera that took the image.
- the associated focal length may be an estimate of a focal length that comes from metadata associated with the image.
- the associated focal length may alternatively be an estimate of a focal length acquired by running a camera model characterizing structure from motion algorithm on a plurality of images taken by cameras of the same model as the camera that took the image in the first set of images.
- the camera model characterizing structure from motion algorithm may be configured to iteratively refine both a 3D model and a focal length estimate of images taken by the same camera model.
- the camera model characterizing structure from motion algorithm may additionally refine distortion parameters of the camera model.
- an image of the first set of images is transformed into a normalized training image, the image is rescaled. It should be understood that the image may be rescaled around a central point in the image, e.g. around a central pixel. It should also be understood that all images of the first set of images may be rescaled to the same size.
- Transforming an image of the first set of images may, in addition to rescaling, comprise a distortion reduction, wherein image distortions are decreased.
- the distortion reduction may be based on distortion parameters extracted by the camera model characterizing structure from motion algorithm. The distortion reduction may be done before the rescaling.
- the rescaling of an image may be done such that the associated focal length of the image approaches the joint focal length.
- the set of normalized training images may all have the same joint focal length.
- the associated focal length of the normalized training images may be within ⁇ 10% of the joint focal length or within ⁇ 20% of the joint focal length.
- the precision of the associated focal length of the normalized training images may correspond to a precision in the image depth estimates of the image depth estimation neural network. In some applications a reduced precision may be acceptable in the depth estimates and then the rescaling may not need to be done such that the associated focal lengths of the normalized training images all are the same.
- the least one pair of input data and output data in the training dataset may comprise a normalized training image as input data and at least one depth measure of the normalized training image of said normalized training image as output data.
- the neural network may be trained in converting input data into output data.
- the neural network may be trained in converting an image which has a focal length corresponding to the joint focal length into at least one depth measure.
- the at least one depth measure of the normalized training image in the output data may be a calculated depth measure, calculated based on an image series.
- the at least one depth measure of the normalized training image may be a directly measured depth measure, e.g. measured using a RGB-D camera.
- the depth measures of the images of the set of normalized training images may comprise only calculated depth measures, only directly measured depth measures or a mix of calculated depth measures and directly measured depth measures.
- the normalized training image of the input data and the at least one depth measure of the output data may be derived from one of the pairs of image data and depth data of a dataset that is generated according to the method of the first aspect of the invention.
- the depth measures of the images of the set of normalized training images may comprise calculated depth measures according to the first aspect. It should be understood that when an image from the dataset according to the first aspect of the invention is rescaled and entered as input data in the training dataset, the representation of the at least one depth measure of the corresponding image may be transformed accordingly before entered as output data in the training dataset. For example, consider a dataset that is generated according to the first aspect of the invention wherein the dataset comprises one image and one depth map of the image in each pair. When the image of a pair is rescaled the depth map may also be rescaled accordingly. The rescaled image and the rescaled depth map may then be entered into the training dataset as a pair of input data and output data. [00041] The rescaling of an image of the first set of images may comprise scaling the image by a factor, the factor being inversely proportional to the focal length associated with the image.
- the factor may be the joint focal length of the set of normalized training images divided by the focal length associated with the image.
- each image of the set of normalized training images may have an associated focal length that is the joint focal length.
- f the focal length
- Such a rescaling may correspond to moving pixel information in the image a factor 2 away from the center pixel in both the x- and y-direction, thereby mimicking a factor 2 change in focal length.
- the first set of images may comprise images associated with a plurality of focal lengths.
- the neural network may be trained to handle images coming from a variety of cameras with a variety of focal lengths.
- the rescaling of an image may comprise: cropping the image if the focal length associated with the image is smaller than the joint focal length; padding the image if the focal length associated with the image is larger than the joint focal length.
- Cropping and padding may be image processing task that require only small computational resources. Thus, the rescaling may be performed effectively.
- Cropping the image may comprise removing outer parts of the image. For example, images from the first set of images that have a focal length smaller than the joint focal length may be rescaled in a manner corresponding to zooming-in and pixel information ending up outside the image boundaries may be cropped.
- Padding the image may comprise introducing new pixels around the outer parts of the image, e.g. pixels comprising no image information or pixels comprising a fixed value such as e.g. zero. For example, images from the first set of images that have a focal length larger than the joint focal length may be rescaled in a manner corresponding to zooming-out and zero valued pixels may be introduced along the image boundaries.
- the pixel information of the images may also be redistributed in conjunction with cropping or padding such that all the images in the set of normalized training images have the same number of pixels and aspect ratio.
- a method for constructing a three-dimensional (3D) model of a scene comprising: receiving a set of scene images, the set of scene images being images of the scene taken from a plurality of viewpoints; receiving, for each of the scene images, an associated focal length, the associated focal length being an estimate of the focal length of the camera that took the scene image; transforming the set of scene images into a set of normalized images, the set of normalized images representing how the set of scene images would appear if the images of the set had an joint focal length, wherein transforming a scene image into a normalized image comprises rescaling the image to represent a change in the associated focal length of the image such that it approaches the joint focal length; obtaining at least one estimate of a depth measure of at least one image of the set of normalized images using an image depth estimation neural network, wherein a depth measure of an image is a representation of a distance between a viewpoint from which the image was taken and a point
- a distance between a viewpoint from which the image was taken and a point in the scene may be the distance between the optical center of the camera and the point in the scene.
- a representation of said distance may e.g. be said distance itself or e.g. the shortest distance between the point in the scene and the camera plane.
- An advantage of the method according to the third aspect is that it may provide an accurate 3D model of the scene.
- the method according to the third aspect transforms the set of scene images into a set of normalized images, the method may utilize an image depth estimation neural network that is trained on normalized images, e.g.
- an image depth estimation neural network generated according to the second aspect.
- accurate depth estimates may be outputted.
- Accurate depth measures may by extension translate into an accurate 3D model.
- Another advantage of the method according to the third aspect is that it may be versatile.
- the accuracy of the 3D model may be indifferent or depend only weakly on which camera has generated the images of the set of scene images.
- the set of scene images may e.g. comprise images taken by different cameras with different focal lengths.
- the method may avoid introducing inaccuracies in the 3D model due to the cameras having different camera focal lengths or different distortion properties.
- Another advantage of the method according to the third aspect is that it may be cost-effective as it may be based on an image depth estimation neural network that may be generated at low cost from images that are collected not solely for the purpose of training the image depth estimation neural network.
- an associated focal length i.e. an estimate of a focal length of the camera that took the image.
- the associated focal length may be an estimate of a focal length that comes from metadata associated with the image.
- the associated focal length may alternatively be an estimate of a focal length acquired by running a camera model characterizing structure from motion algorithm on a plurality of images taken by cameras of the same model as the camera that took the image in the first set of images.
- the camera model characterizing structure from motion algorithm may be configured to iteratively refine both a 3D model and a focal length estimate of images taken by the same camera model.
- the camera model characterizing structure from motion algorithm may additionally refine distortion parameters of the camera model.
- Transforming an image of the set of scene images may, in addition to rescaling, comprise a distortion reduction, wherein image distortions are decreased.
- the distortion reduction may be based on distortion parameters extracted by the camera model characterizing structure from motion algorithm. The distortion reduction may be done before the rescaling.
- the rescaling of an image may be done such that the associated focal length of the image approaches the joint focal length of the method according to the third aspect.
- the set of normalized images may all have the same joint focal length.
- the associated focal length of the normalized images may be within ⁇ 10% of the joint focal length or within ⁇ 20% of the joint focal length.
- the precision of the associated focal length of the normalized images may correspond to an accuracy in the 3D model constructed by the method according to the third aspect. In some applications a reduced accuracy may be acceptable in the 3D model and then the rescaling may not need to be done such that the associated focal lengths of the normalized images all are the same.
- a depth measure of a normalized image may be obtained by inputting the normalized image into the image depth estimation neural network such that the image depth estimation neural network outputs at least one estimate of a depth measure of said normalized image.
- the method may, in addition to constructing the 3D model of the scene, also form a reconstructed camera pose for at least one image of the set of scene images, or at least two images of the set of scene images. In some instances, the method may also form a reconstructed camera pose for all images of the set of scene images, wherein the reconstructed camera pose comprises a camera position and a camera orientation.
- a camera orientation may herein be an orientation of the camera relative to the 3D axes containing the 3D model. The camera orientation may define the pan, tilt, and roll of the camera.
- the image depth estimation neural network may be a neural network that estimates a depth measure of an image based on said image alone, thereby providing a single-image-depth-estimate.
- the image depth estimation neural network may be a neural network generated according to the method of the second aspect of the invention.
- the method for constructing a 3D model according to the third aspect may thereby make use of the advantages that a neural network generated according to the method of the second aspect provides.
- the rescaling of the images in the set of scene images in the method for constructing a 3D model according to the third aspect of the invention and the rescaling of the images in the first set of images in the method for generating the image depth estimation neural network according to the second aspect of the invention may share a common joint focal length.
- the method for generating an image depth estimation neural network may rescale images to appear as if they had a joint focal length: fctraining, and then use these images to train the image depth estimation neural network.
- the method for constructing a 3D model of a scene may rescale scene images to appear as if they had a joint focal length: fcmodeiiing, and then input the rescaled images into said image depth estimation neural network.
- Processing images associated with the set of scene images together with the at least one estimate of a depth measure may comprise: reconstructing a camera position and a camera orientation for each image using a 3D reconstruction pipeline; detecting an object using an object detection pipeline, wherein the object appears in at least two images of the set of scene images; extracting an object depth measure for the object for each of the at least two images from the at least one estimate of a depth measure, wherein an object depth measure is derived from at least a depth measure for at least one point on the object; forming an object position in the 3D model, wherein the object position comprises coordinates of the object, by either: finding an estimated object position from each of the at least two images, the estimated object position of an image being based on the reconstructed camera position, the reconstructed camera orientation, and the object depth measure of the image, and forming the object position from the estimated object positions; or finding a triangulated object position, the triangulated object position being based on triangulation using the reconstructed camera positions and the reconstructed camera
- the 3D reconstruction pipeline may be a structure from motion pipeline or a simultaneous localization and mapping pipeline.
- the object detection pipeline may e.g. be implemented as described in [Seamless Scene Segmentation, Porzi et al. , The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8277-8286]
- a server for generating a dataset for training an image depth estimation neural network comprising a processor configured to: receive an image series comprising a plurality of images of a scene, each image being an image acquired by a camera, the camera having a position at the time of the acquisition of the image; receive, for each image of the image series, a measured camera position, the measured camera position being the position of the camera measured at the acquisition of the image; form a three-dimensional (3D) reconstruction of the scene, the 3D reconstruction comprising coordinates of a 3D model of the scene and a reconstructed camera position for each image, the reconstructed camera position being an estimate of the position of the camera relative to the 3D model at the acquisition of the image, wherein the 3D reconstruction of the scene is formed by running a structure from motion algorithm on the image series, wherein the structure from motion algorithm is configured to align the 3D reconstruction to the measured camera positions; calculate at least one depth measure of at least one image of the image series
- the server may be a single physical server or a distributed server, a distributed server comprising a plurality of physical servers, possibly distributed over a number of locations, acting together.
- the server may comprise a memory and at least one receiver.
- the memory may store computer readable instructions for the processor to perform.
- the memory may also store the formed dataset.
- the at least one receiver may be configured to receive the image series and the measured camera positions.
- a server may allow images being collected from a plurality of cameras and put together into a dataset which by extension may improve 3D modelling of scenes.
- a server for generating an image depth estimation neural network the image depth estimation neural network being a neural network that estimates at least one depth measure of an image of a scene, wherein a depth measure of the image is a representation of a distance between a viewpoint from which the image was taken and a point in the scene of the image
- the server comprising a processor configured to: receive a first set of images, the first set of images being a plurality of images of a scene taken by one or more cameras; receive, for each image in the first set of images, an associated focal length, the associated focal length being an estimate of a focal length of the camera that took the image; transform the first set of images into a set of normalized training images, the set of normalized training images representing how images of the first set of images would appear if the images of the set had an joint focal length, wherein transforming an image of the first set of images into a normalized training image comprises rescaling the image, the rescaling representing
- the server may be a single physical server or a distributed server, a distributed server comprising a plurality of physical servers, possibly distributed over a number of locations, acting together.
- the server may comprise a memory and at least one receiver.
- the memory may store computer readable instructions for the processor to perform.
- the memory may also store the generated image depth estimation neural network.
- the at least one receiver may be configured to receive the first set of images and the associated focal lengths.
- a server may allow training the image depth estimation neural network with images collected from a plurality of cameras which by extension may improve 3D modelling of scenes.
- a server for constructing a three-dimensional (3D) model of a scene comprising a processor configured to: receive a set of scene images, the set of scene images being images of the scene taken from a plurality of viewpoints; receive, for each of the scene images, an associated focal length, the associated focal length being an estimate of the focal length of the camera that took the scene image; transform the set of scene images into a set of normalized images, the set of normalized images representing how the set of scene images would appear if the images of the set had an joint focal length, wherein transforming a scene image into a normalized image comprises rescaling the image to represent a change in the associated focal length of the image such that it approaches the joint focal length; obtain at least one estimate of a depth measure of at least one image of the set of normalized images using an image depth estimation neural network, wherein a depth measure of an image is a representation of a distance between a viewpoint from which the image was taken and
- the server may be a single physical server or a distributed server, a distributed server comprising a plurality of physical servers, possibly distributed over a number of locations, acting together.
- the server may comprise a memory and at least one receiver.
- the memory may store computer readable instructions for the processor to perform.
- the memory may also store the image depth estimation neural network and/or the constructed 3D model.
- the at least one receiver may be configured to receive the set of scene images and the associated focal lengths.
- a server according to the sixth aspect may allow improved 3D modelling based on images collected from a plurality of cameras.
- the servers of the fourth, fifth, and sixth aspect may be different servers. Alternatively, two or more servers of the fourth, fifth, and sixth aspect may be implemented on the same server.
- FIG. 1 illustrates a flowchart of a method for generating a dataset for training an image depth estimation neural network.
- FIG. 2 illustrates an example of a flow of data during implementation of a method for generating a dataset for training an image depth estimation neural network.
- FIG. 3 illustrates a flowchart of a method for generating an image depth estimation neural network.
- FIG. 4 illustrates an example of a flow of data during implementation of a method for generating an image depth estimation neural network.
- FIG. 5 illustrates a flowchart of a method for constructing a 3D model of a scene.
- Fig. 6 illustrates an example of a flow of data during implementation of a method for constructing a 3D model of a scene.
- Fig. 1 illustrates a flowchart of a method 100 for generating a dataset for training an image depth estimation neural network 50. It should be understood that the steps of the method 100 do not necessarily need to be performed in the order depicted in Fig. 1.
- the method 100 will hereinafter be described, by way of example, using the flow of data illustrated in Fig. 2. However, it should be understood that other implementations of the method 100 may also be possible.
- the method 100 comprises receiving 102 an image series 10 comprising a plurality of images 11 of a scene, each image 11 being an image acquired by a camera, the camera having a position at the time of the acquisition of the image. As illustrated in Fig.
- the images 11 of the image series 10 may be associated with other types of data than image data.
- Each image may e.g. be associated with an indication 12 of a measured camera position, and/or an indication 14 of a camera model, and/or an indication 16 of an associated focal length.
- the mentioned indications 12, 14, 16 may be metadata associated with the image 11. It should be understood that Fig. 2 is a schematic illustration of the images 11. The indications 12, 14, 16 may not be part of the image information that depicts the scene.
- the method 100 further comprises receiving 104, for each image of the image series, a measured camera position, the measured camera position being the position of the camera measured at the acquisition of the image.
- the measured camera position may e.g. be received 104 as an indication 12 of a measured camera position from metadata associated with the image 11.
- the indication 12 of a measured camera position may e.g. be may be a position measured with a positioning system, e.g. a GPS positioning system, a wifi positioning system, or a cell ID positioning system, at the time the image 11 was acquired.
- the method 100 further comprises forming 106 a 3D reconstruction of the scene, the 3D reconstruction comprising coordinates of a 3D model of the scene and a reconstructed camera position for each image, the reconstructed camera position being an estimate of the position of the camera relative to the 3D model at the acquisition of the image, wherein the 3D reconstruction of the scene is formed by running a structure from motion algorithm on the image series, wherein the structure from motion algorithm is configured to align the 3D reconstruction to the measured positions.
- the structure from motion algorithm may herein be the algorithm implemented in OpenSfM [https://github.com/mapillary/OpenSfM].
- Aligning the 3D reconstruction to the measured positions may be done through bundle adjustment.
- an initial 3D reconstruction may be improved by the bundle adjustment where the coordinates of a 3D model of the scene and the coordinates of the reconstructed camera positions are simultaneously refined according to one or more cost functions that penalizes 3D reconstructions that deviates from what is believed to be the ground truth.
- a cost function that is proportional to squared distance between the reconstructed camera position and the corresponding measured camera position, may be used. For example, such a cost function may be added to the alignment step in OpenSfM. This may impose a scale on the 3D reconstruction, wherein the scale matches the scale given by the measured camera positions.
- the method 100 further comprises calculating 108 at least one depth measure of at least one image 11 of the image series based on the 3D reconstruction, wherein a depth measure of an image 11 is a representation of a distance between a viewpoint from which the image was taken and a point in the scene of the image.
- Depth measures may be calculated e.g. through a Patch-Match based multi-view stereo algorithm such as [S. Shen. IEEE Transactions on Image Processing, 22(5): 1901 -1914, May 2013. 4] This may be seen as a simple winner- takes-all stereo algorithm. Different depth and normal values may be tested for each pixel and the one that gives the best normalized cross correlation score with the neighboring views may be kept. The result may be a dense but noisy depth map. Most of the noise in the depth maps may be removed in a post-processing step that checks the consistency between the depth maps of neighboring images. Depth values that are not consistent with at least two neighboring views may be removed. This may reduce the number of pixels for which a depth value is produced.
- a Patch-Match based multi-view stereo algorithm such as [S. Shen. IEEE Transactions on Image Processing, 22(5): 1901 -1914, May 2013. 4] This may be seen as a simple winner- takes-all stereo algorithm. Different depth and normal values may be tested for each pixel and the one that gives
- Depth maps for which the number of pixels with a depth value are below a threshold value, e.g. below 5% of the total number of pixels in the image 11 may be discarded together with the corresponding image 11.
- the method 100 may thus calculate 108 at least one depth measure of at least one image 11 in the form of a depth map 23 of the at least one image 11 , wherein the at least one depth measure may correspond to the pixels of the depth map having a depth value.
- the at least one depth measure may also be represented in other ways, e.g. as a single value or as a vector of values.
- the method 100 further comprises forming 110 the dataset 20 out of pairs 21 of image data and depth data, wherein each pair 21 comprises an image 11 of the image series 10 as image data and the at least one depth measure of the corresponding image as depth data.
- the pairs 21 of image data and depth data are pairs 21 of images 11 and their corresponding depth maps 23. As illustrated in Fig. 2 there may be fewer images 11 in the dataset 20 than in the original image series 10.
- Fig. 3 illustrates a flowchart of a method 200 for generating an image depth estimation neural network 50. It should be understood that the steps of the method 200 do not necessarily need to be performed in the order depicted in Fig. 3.
- the method 200 comprises receiving 202 a first set of images, the first set of images being a plurality of images of a scene taken by one or more cameras.
- the method 200 further comprises receiving 204, for each image in the first set of images, an associated focal length, the associated focal length being an estimate of a focal length of the camera that took the image.
- the associated focal length may be received in several different ways.
- an image of the first set of images may be associated with an indication 14 of a camera model, e.g. GoPro Hero 8.
- the indication 14 of the camera model may be stored in metadata associated with the image. Images from other cameras of the same camera model may have been analyzed on a prior occasion and an estimate of the focal length may have been extracted and stored in a database.
- an indication 14 of a camera model the associated focal length may be received from the database.
- the associated focal length may also be stored directly in metadata of the image, e.g. as an indication 16 of the associated focal length. Said indication 16 may e.g. be provided by the manufacturer of the camera or added as metadata in a prior analyzing step of the image.
- the method 200 further comprises transforming 206 the first set of images into a set 30 of normalized training images, the set 30 of normalized training images representing how images of the first set of images would appear if the images of the set had an joint focal length, wherein transforming 206 an image of the first set of images into a normalized training image comprises rescaling the image, the rescaling representing a change in the associated focal length of the image such that it approaches the joint focal length.
- the joint focal length of the method 200 may be a pre-determined focal length, it may not necessary be one of the associated focal lengths of the images of the first set of images.
- Transforming 206 an image such that the associated focal length of the image approaches the joint focal length may mean rescaling the image such that the associated focal length of the rescaled image is equal to the joint focal length. However, it may also mean that the associated focal length of the rescaled image is similar to the joint focal length.
- the rescaling of an image of the first set of images may comprise scaling the image by a factor.
- the factor may be the joint focal length of the set of normalized training images divided by the focal length associated with the image.
- the associated focal length of the rescaled image may be equal to the joint focal length.
- the rescaling of an image of the first set of images may further comprise: cropping the image if the focal length associated with the image is smaller than the joint focal length, or padding the image if the focal length associated with the image is larger than the joint focal length. If the factor is larger than one, the fraction of the image corresponding to the fraction of the factor above one may be cropped. Padding may be implemented analogously.
- the method 200 further comprises training 208 the neural network to predict at least one depth measure of an image, wherein training the neural network comprises providing the neural network with a training dataset of pairs of input data 54 and output data 56, wherein the input data 54 of the training dataset comprises the set 30 of normalized training images.
- Fig. 4 illustrates an example of a flow of data leading to providing the neural network with a training dataset of pairs of input data 54 and output data 56, such that it may be trained to form an image depth estimation neural network 50.
- a training dataset of pairs of input data 54 and output data 56 such that it may be trained to form an image depth estimation neural network 50.
- other implementations of the method 200 may also be possible.
- Fig. 4 the image depth estimation neural network 50 is trained using data derived from one or more datasets 20 that are generated according to the text in conjunction with Fig. 1 and 2.
- the images 11 of the dataset 20 herein form the first set of images.
- the images 11 of the dataset 20 may have been derived using an image series 10 of images 11 that were interrelated in some way, e.g. that depicted the same scene from different viewpoint or different angles, had some image overlap etc., this may not necessarily be the case for the images of the first dataset.
- a first image series 10 of interrelated images with corresponding measured camera positions may be used to generate a first dataset 20 and a second image 10 series of interrelated images with corresponding measured camera positions may be used to generate a second dataset 20.
- the first set of images with associated depth maps may be formed from one image/depth map from the first image series 10, one image/depth map from the second image series 10, and so forth.
- the image depth estimation neural network 50 there may not be any requirement on the images being interrelated or being images of the same scene or being associated with measured camera positions.
- the only requirement on the images may be that their depth maps are accurate. Each depth map may in turn be accurate due to it being formed from an image series 10 of interrelated images which are associated with measured camera positions. However, at the point of training the image depth estimation neural network 50, such requirement may no longer be needed.
- each image 11 of the dataset 20 is transformed 206 into a set 30 of normalized training images.
- the normalized training images are annotated 1 T in Fig. 4 to indicate that they are images derived from images 11.
- the set 30 of normalized training images then forms the input data 54 for training the neural network.
- each image 11 is associated with a depth map representing at least one depth measure of the image.
- the depth maps 23 may be transformed 220 into a transformed depth map 23’, wherein the transformation mimics the transformation of the corresponding image 11.
- each pixel in the transformed depth map 23’ may be linked to the same image information as before the transformation. If the position of e.g.
- the depth measure of the stop sign may move accordingly within the depth map.
- the transformed depth map 23’ may subsequently be used as output data 56 for training the neural network. With said input data 54 and output data 56 the neural network may be trained to form the image depth estimation neural network 50.
- Fig. 5 illustrates a flowchart of method 300 for constructing a 3D model of a scene. It should be understood that the steps of the method 300 do not necessarily need to be performed in the order depicted in Fig. 5. The method 300 will hereinafter be described, by way of example, using the flow of data illustrated in Fig. 6. However, it should be understood that other implementations of the method 300 may also be possible.
- the method 300 comprises receiving 302 a set 60 of scene images 61 , the set 60 of scene images 61 being images of the scene taken from a plurality of viewpoints. Each image of the set 60 of scene images 61 may at least partially overlap with another image of the set 60 of scene images 61.
- the method 300 further comprises receiving 304, for each of the scene images 61 , an associated focal length, the associated focal length being an estimate of the focal length of the camera that took the scene image.
- the associated focal length may be received in several different ways, as previously described.
- an image 61 of the set 60 of scene images may be associated with an indication 14 of a camera model, e.g. an indication 14 stored in metadata associated with the image.
- the associated focal length may be received from a database.
- the associated focal length may also be stored directly in metadata of the image, e.g. as an indication 16 of the associated focal length.
- the method 300 further comprises transforming 306 the set 60 of scene images 61 into a set 70 of normalized images 6T, the set 70 of normalized images 6T representing how the set 60 of scene images 61 would appear if the images of the set had an joint focal length, wherein transforming a scene image 61 into a normalized image 6T comprises rescaling the image to represent a change in the associated focal length of the image such that it approaches the joint focal length.
- the joint focal length of the method 300 may be a pre-determined focal length.
- the rescaling of an image of the set 60 of scene images 61 may be performed analogously to the rescaling of an image of the first set of images.
- the rescaling of an image of the set 60 of scene images 61 may comprise scaling the image by a factor.
- the factor may be the joint focal length of the set of normalized images divided by the focal length associated with the image.
- the associated focal length of the rescaled image may be equal to the joint focal length.
- the rescaling of an image of the set 60 of scene images 61 may further comprise: cropping the image if the focal length associated with the image is smaller than the joint focal length, or padding the image if the focal length associated with the image is larger than the joint focal length. If the factor is larger than one, the fraction of the image corresponding to the fraction of the factor above one may be cropped. Padding may be implemented analogously.
- the method 300 further comprises obtaining 308 at least one estimate of a depth measure of at least one image of the set of normalized images using an image depth estimation neural network 50, wherein a depth measure of an image is a representation of a distance between a viewpoint from which the image was taken and a point in the scene of the image, wherein the image depth estimation neural network is a neural network that estimates a depth measure of an image.
- the image depth estimation neural network 50 may be a neural network generated according to the method 200 for generating an image depth estimation neural network 50. Furthermore, the joint focal length of the method 300 for constructing a 3D model of a scene may be equal to the joint focal length of the method 200 for generating the image depth estimation neural network 50. Thus, images may be provided to the image depth estimation neural network 50 in the same format, i.e. with the same joint focal length, as the image depth estimation neural network 50 has been trained to handle.
- the at least one estimate of a depth measure may e.g. be an estimated depth map 83 of an image 6T of the set 70 of normalized images 6T.
- the method 300 further comprises processing images associated with the set 60 of scene images 61 together with the at least one estimate of a depth measure using a structure for motion algorithm 90 to construct the 3D model of the scene.
- the image depth estimation neural network 50 may produce an estimated depth map 83.
- the structure for motion algorithm 90 may then process each normalized image 6T with its corresponding estimated depth map 83 to construct the 3D model of the scene.
- the structure for motion algorithm 90 may also process another image associated with the scene image 61 , e.g. the scene image 61 itself or another transformation of the scene image 61.
- the estimated depth map 83 is transformed correspondingly such that each pixel in the transformed estimated depth map 83 is linked to the correct image information in the transformation of the scene image 61.
- the structure from motion algorithm may reconstruct a camera position and camera orientation for each scene image 61 using a structure from motion pipeline or a simultaneous localization and mapping (SLAM) pipeline.
- SLAM simultaneous localization and mapping
- An object may then be detected, using an object detection pipeline, in at least two images of the set of scene images. For each of the at least two images an estimate of a depth measure for at least one point on the object may be found. For each image of the at least two images an object depth measure may then be derived from these estimates of depth measures of the object.
- the object is a stop sign. There may exist a plurality of depth measures for the object in each image, e.g.
- the plurality of depth measures of the object may form one object depth measure.
- the object depth measure may thus represent a distance between the viewpoint of the camera and the object, in this case the stop sign, at the time of the acquisition of the image.
- An object position may then be formed in the 3D model by utilizing the object depth measure. Two examples of this are given below.
- each of the at least two images gives an estimated object position.
- the estimated object position of an image may be calculated from the reconstructed camera position, the reconstructed camera orientation, and the object depth measure.
- the estimated object position may be the reconstructed camera position translated by a distance given by the object depth measure in a direction given by the camera orientation and the object’s relation to the center pixel in the image.
- the estimated object positions, one from each of the at least two images, may then together form the object position.
- the estimated object positions may be clustered to form the object position.
- other calculations based on the estimated object positions may give the object position.
- a triangulated object position may be found.
- a vector may be calculated from the reconstructed camera position of the image in a direction given by the camera orientation and the object’s relation to the center pixel in the image.
- the vectors may then cross to give a triangulated object position.
- Said triangulated object position may then be compared to an estimated object position of one of at least two images. If the triangulated object position and the estimated object position matches the triangulated object position may form the object position.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE2050179 | 2020-02-17 | ||
PCT/US2021/018254 WO2021167910A1 (en) | 2020-02-17 | 2021-02-16 | A method for generating a dataset, a method for generating a neural network, and a method for constructing a model of a scene |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4107699A1 true EP4107699A1 (en) | 2022-12-28 |
Family
ID=74860553
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21710811.7A Withdrawn EP4107699A1 (en) | 2020-02-17 | 2021-02-16 | A method for generating a dataset, a method for generating a neural network, and a method for constructing a model of a scene |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4107699A1 (en) |
CN (1) | CN115053260A (en) |
WO (1) | WO2021167910A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152323B (en) * | 2023-04-18 | 2023-09-08 | 荣耀终端有限公司 | Depth estimation method, monocular depth estimation model generation method and electronic equipment |
CN117690095B (en) * | 2024-02-03 | 2024-05-03 | 成都坤舆空间科技有限公司 | Intelligent community management system based on three-dimensional scene |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SG11202100949YA (en) * | 2018-08-08 | 2021-02-25 | Abyssal S A | System and method of operation for remotely operated vehicles for simultaneous localization and mapping |
-
2021
- 2021-02-16 EP EP21710811.7A patent/EP4107699A1/en not_active Withdrawn
- 2021-02-16 WO PCT/US2021/018254 patent/WO2021167910A1/en unknown
- 2021-02-16 CN CN202180011373.7A patent/CN115053260A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN115053260A (en) | 2022-09-13 |
WO2021167910A1 (en) | 2021-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112785702B (en) | SLAM method based on tight coupling of 2D laser radar and binocular camera | |
US10334168B2 (en) | Threshold determination in a RANSAC algorithm | |
CA2326816C (en) | Face recognition from video images | |
KR101791590B1 (en) | Object pose recognition apparatus and method using the same | |
US10909395B2 (en) | Object detection apparatus | |
US9959625B2 (en) | Method for fast camera pose refinement for wide area motion imagery | |
CN106447601B (en) | Unmanned aerial vehicle remote sensing image splicing method based on projection-similarity transformation | |
CN104077760A (en) | Rapid splicing system for aerial photogrammetry and implementing method thereof | |
AliAkbarpour et al. | Fast structure from motion for sequential and wide area motion imagery | |
WO2020221443A1 (en) | Scale-aware monocular localization and mapping | |
CN111882655B (en) | Method, device, system, computer equipment and storage medium for three-dimensional reconstruction | |
EP4107699A1 (en) | A method for generating a dataset, a method for generating a neural network, and a method for constructing a model of a scene | |
AliAkbarpour et al. | Parallax-tolerant aerial image georegistration and efficient camera pose refinement—without piecewise homographies | |
CN114627491A (en) | Single three-dimensional attitude estimation method based on polar line convergence | |
Hallquist et al. | Single view pose estimation of mobile devices in urban environments | |
Bethmann et al. | Object-based multi-image semi-global matching–concept and first results | |
CN111325828A (en) | Three-dimensional face acquisition method and device based on three-eye camera | |
CN117456114B (en) | Multi-view-based three-dimensional image reconstruction method and system | |
Tsaregorodtsev et al. | Extrinsic camera calibration with semantic segmentation | |
CN112270748B (en) | Three-dimensional reconstruction method and device based on image | |
CN113808103A (en) | Automatic road surface depression detection method and device based on image processing and storage medium | |
EP1580684B1 (en) | Face recognition from video images | |
KR102522923B1 (en) | Apparatus and method for estimating self-location of a vehicle | |
CN117011481A (en) | Method and device for constructing three-dimensional map, electronic equipment and storage medium | |
Wang et al. | Fast and accurate satellite multi-view stereo using edge-aware interpolation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220819 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20240109 |