GB2624542A

GB2624542A - Stabilised 360 degree depth imaging system without image rectification

Info

Publication number: GB2624542A
Application number: GB2316262.1A
Authority: GB
Inventors: Cope Alex; Jamieson Eastwood Joe; Watson Matthew
Original assignee: Opteran Tech Ltd
Current assignee: Opteran Tech Ltd
Priority date: 2023-10-24
Filing date: 2023-10-24
Publication date: 2024-05-22
Also published as: GB202316262D0

Abstract

Image depth estimation comprises obtaining (201) pairs of stable preprocessed images, the images being rotationally stabilised and generating (203) a disparity map (e.g. using semi-global block matching) based on the stable preprocessed images (e.g. by minimising a cost function). The disparity map is filtered (205) (e.g. to preserve edges), and depth measurements are calculated (207) for the stable images based on the filtered map. The depth measurements are then outputted. The stable preprocessed images may be produced by receiving images of a cylindrical data structure (e.g. a cylindrical projection derived from raw images captured by vertically aligned 360 degree cameras), reducing the pixel intensity of the images and filtering the images to obtain the stable images. Outliers in the disparity map may be removed by segmentation. Disparity map filtering may comprise applying a weighted least squares disparity filter to a map. Computationally expensive image rectification of stereo image pairs is avoided.

Description

STABILISED 360 DEGREE DEPTH IMAGING SYSTEM WITHOUT IMAGE

RECTIFICATION

Technical Field

[0001] The present application relates to a system, apparatus, and method(s) of depth estimation in stereo images without rectification.

Background

[0002] Stereo images are pairs of two-dimensional images captured from slightly different viewpoints, akin to what our two eyes see in the real world. These paired images are generated and combined to create a 3D illusion, mimicking how humans perceive depth.

[0003] Numerous computer vision libraries are made available to process stereo images and depth estimation. For example, Open Computer Vision (OpenCV) is one such conventional library. OpenCV is implemented in C++ and can be operated using Python bindings. Computer vision libraries such as OpenCV typically require image rectification, ensuring that corresponding points in the left and right images are aligned horizontally. This process simplifies the depth perception calculations thereafter.

[0004] Image rectification aims to remove perspective distortions and ensure that objects that should be parallel or at specific orientations appear as such in the rectified image. This is often achieved by transforming or warping an image in such a way that certain features or objects in the image appear in a specific, predetermined alignment or configuration. Image rectification is therefore a cumbersome and computationally intensive process, thus making the process costly.

[0005] Moreover, computer vision libraries such as OpenCV are not equipped to handle customised stereo image pairs, such as a pair of images generated from Opteran's Development Kit (ODK). Conventional computer vision libraries provide little guidance on how they function to process different resolution high frame-rate data, such as those supplied by ODK cameras, or whether they would work at all, provided that most libraries, such as OpenCV, require specialised datasets that have undergone image rectification. As such, there is an unmet need for a robust method to perform depth estimation and processing of stereo images more effectively, especially in scenarios where conventional libraries fall short.

[0006] The present invention described here introduces an innovative system that systematically stabilizes the depth information within a pair of stereo images, offering a novel solution in the domains of system design, apparatus development, and method implementation. This invention finds valuable applications in various applications, such as in the field of machine vision and robotics.

[0007] The embodiments described below are not limited to implementations which solve any or all of the disadvantages of the known approaches described above.

Summary

[0008] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to determine the scope of the claimed subject matter; variants and alternative features which facilitate the working of the invention and/or serve to achieve a substantially similar technical effect should be considered as falling into the scope of the invention disclosed herein.

[0009] The present invention relates to a software/algorithmic pipeline for depth estimation. Assuming a parameter defines a range of possible disparities, the range of possible disparities in the present invention may, for example, be [-1... 18] pixels for an input image. For each pixel, a dissimilarity score is thereby calculated with the corresponding pixel at each possible disparity in the matched image. The calculation thereby creates a cost volume with dimensions (W x H x D).

[0010] Next, the cost volume is aggregated by traversing one-dimensional lines in a plurality of directions across the image in a way that minimises a given energy function. The sub-pixel disparity level with minimum cost is interpolated for each pixel to create a 2D disparity map. This disparity map is filtered with the original input image as a guide to promote edge preservation. Further post-processing steps may be applied, which include top-bottom consistency check and speckle removal. The result of which provides an accurate depth estimation from/of the input image. The depth estimation pipeline is tested in a variety of environments. The baseline is varied, and the impact on the sensing range is determined and described in the following sections.

[0011] The overall time complexity of the algorithmic pipeline presented is Advantageously 0(VVHD), where the input image has a resolution (W x H) and D is the number of disparity levels considered.

[0012] In a first aspect, the present disclosure provides a computer-implemented method for image depth estimation, the method comprising: obtaining one or more stable preprocessed images, wherein said one or more stable preprocessed images are rotationally stabilised; generating at least one disparity map based on said one or more stable preprocessed images, filtering said at least one disparity map, calculating depth measurements for said one or more stable preprocessed images based on the filtered disparity map; and outputting the depth measurements.

[0013] In a second aspect, the present disclosure provides an apparatus for processing a stereo image pair, the apparatus comprising: at least two cameras for capturing the stereo image pair; and one or more modules configured to: convert said one more images to one or more stable preprocessed images that are rotationally stable, generate at least one disparity map based on said one or more stable preprocessed images, filter said at least one disparity map, calculating depth measurements for said one or more stable preprocessed images based on the filtered disparity map, and output the depth measurements for the stereo image pair.

[0014] In a third aspect, the present disclosure provides a system for estimating depth of a stereo image pair, the system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform any method according to the first aspect.

[0015] The methods described herein may be performed by software in machine-readable form on a tangible storage medium e.g. in the form of a computer program comprising computer program code means adapted to perform all the steps of any of the methods described herein when the program is run on a computer and where the computer program may be embodied on a computer-readable medium. Examples of tangible (or non-transitory) storage media include disks, thumb drives, memory cards etc. and do not include propagated signals. The software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.

[0016] This application acknowledges that firmware and software can be valuable, separately tradable commodities. It is intended to encompass software, which runs on or controls "dumb" or standard hardware, to carry out the desired functions. It is also intended to encompass software which "describes" or defines the configuration of hardware, such as HDL (hardware description language) software, as is used for designing silicon chips, or for configuring universal programmable chips, to carry out desired functions.

[0017] The optional features or options described herein may be combined as appropriate, as would be apparent to a skilled person, and may be combined with any of the aspects of the invention.

Brief Description of the Drawings

[0018] Embodiments of the invention will be described, by way of example, with reference to the following drawings, in which: [0019] Figure 1 is a schematic diagram of epipolar geometry of a convergent stereo pair according to aspects of the disclosure; [0020] Figure 2 is a flow diagram of depth estimation process according to aspects of the disclosure; [0021] Figure 3 is a pictorial diagram a HEALPix/fourpi data structures at different resolutions according to aspects of the disclosure; [0022] Figure 4 is a pictorial diagram of cylindrical bands within fourpis according to aspects of the disclosure; [0023] Figure 5 is a schematic diagram of extracting prealigned images from fourpi data.

[0024] Figure 6 is a pictorial diagram of aligned stereo image pair according to aspects of the disclosure; [0025] Figure 715 a pictorial diagram of disparity maps according to aspects of the disclosure; [0026] Figure 8 is a schematic diagram of relationship between disparity and depth according to aspects of the disclosure; [0027] Figure 9 is a pictorial diagram of stereo variable baseline ODK mount according to aspects of the disclosure; [0028] Figure 10 is a pictorial diagram of experimental baseline results according to aspects of the disclosure; [0029] Figure 11 is a pictorial diagram of results produced in outdoor and indoor environments according to aspects of the disclosure; [0030] Figure 12 is a pictorial diagram of Initial extraction of estimated and ground truth depth according to aspects of the disclosure; and [0031] Figure 13 is a block diagram of a computing device or apparatus suitable for implementing aspects of the disclosure; [0032] Common reference numerals are used throughout the figures to indicate similar features.

Detailed Description

[0033] Embodiments of the present invention are described below by way of example only. These examples represent the suitable modes of putting the invention into practice that are currently known to the Applicant although they are not the only ways in which this could be achieved. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example.

However, the same or equivalent functions and sequences may be accomplished by different examples.

[0034] Depth estimation for a stereo image pair is generated by estimating the pixel disparity between the base image of the stereo image pair and its corresponding matching image. The relative distance between a pixel in the base image and the matching image represents this pixel disparity. Detecting these correspondences involves comparing pixels or patches of pixels for similarity. When done without prior knowledge of the imaging system, every pixel in the base image could match with every pixel in the matching image, leading to a computationally expensive exhaustive search.

[0035] The concept of Epipolar geometry is one way to reduce the search space. Each pixel in the base image can be envisaged as projecting onto the second image as a line. This concept effectively confines the search for correspondences from a 2D to a 1-D space. However, this method necessitates understanding the imaging system, including factors like the mathematical model of the cameras' optics, distortion characteristics, and the relative six-degree-of-freedom pose between the optical centres of the two cameras. Much of this information may not be readily available.

[0036] Stereo rectification simplifies Epipolar geometry. Both based and matching images are transformed so that they lie on the same projective plane as a vertically aligned stereo vision system. The epipole of a pixel in the base image corresponds to an entire column of pixels in the matching image, resulting in a simplification. This simplification, otherwise known as stereo rectification, improves the search for correspondences.

[0037] For this reason, most stereo imaging implementations rely on rectifying the input image pair, such as using the OpenCV library. Standard rectification pipelines On OpenCV library) function by extracting features using an algorithm such as SIFT from both images. The pipelines find the corresponding features in both images and calculate the global projective image transforms, which would correctly align matched points between the images.

[0038] Performing the feature detection, extraction, matching and the projective transform from the input images to a rectified image pair nevertheless adds computational overhead. The cost of rectification on two images at every frame can be limiting for applications requiring real-time performance at high frame rates. As disparity calculation itself can be computationally intensive, freeing as many resources as possible is desirable.

[0039] Rectification can potentially introduce errors in depth estimation if the projection does not align the images onto the same projective plane. Such errors may arise from various factors: insufficient feature detection, inaccuracies in camera models, incorrect handling of lens distortion, or misidentification of features as correspondences.

[0040] Moreover, in robotic applications where a 360-degree sensing range is preferred, rectification onto a flat plane is not desirable, provided that the field of view of the disparity maps is relatively low with directional machine vision cameras.

[0041] Present invention offers an improved alternative to processing stereo images. The present invention includes Opteran's development kit (ODK) and the pipeline described herein. In particular, the ODK may employ a customized data structure/projection. This data structure can be obtained from the vertically aligned 360 cameras and inherently displacing the input image onto the same projective plane. No stereo rectification is thus required. At the time, using the 360-degree cameras in a vertical alignment, the ODK may generate disparity maps with a field of view of 360 degrees by 90 degrees about the equator. This allows ODK to be used with or for implementing high-framerate applications and/or employing more expensive disparity algorithms to enhance the quality of the disparity maps while maintaining a steadily high framerate.

[0042] In a typical implementation, for example, the ODK will generate filtered disparity maps at around 90 frames per second while running on a single CPU. When the camera and distortion model are known through a prior characterization process, it becomes feasible to calculate the projective transformation using matrices representing the camera parameters, especially with the ODK. In situations where these models are not readily available, they can be estimated, often through tracking techniques like optical flow analysis of specific target features. This approach that would contribute to enhancing depth estimation accuracy.

[0043] Further, removing the rectification step also reduces a potential source of error during disparity calculation due to incorrect alignment. This inevitably leads to higher confidence in the final disparity maps when used for depth estimation. While using a 360-degree pixel data structure where each pixel is equal area (such as a cylindrical data structure/projection described herein) also provides useful downstream benefits when calculating depth from disparity. In effect, this approach and in combination with the above, simplifies the triangulation problem as compared to using other data structures where the variation of pixel size needs to be compensated.

[0044] Furthermore, depth calculation can be made more robust by using the ODK, and according to method shown in Figure 8 with respect to the cylindrical band(s) shown in Figure 4. Each pixel in the cylindrical band(s) as shown in the figure can be mapped to a unit vector representing the pixel's position on the spherical fourpi. This means instead of using the pixel size directly to calculate this distance between the object in the two images, and the two unit vectors can be triangulated to give a depth estimation -provided the distance between the centres of the two fourpis is known. This establishes a baseline distance, which can be estimated from the design of the sensor or characterised precisely.

[0045] Effectively, the method herein described removes the need to know the metric size of each pixel. This removes some reliance on the camera calibration. This is also useful because when the camera sensor data is transformed into the spherical fourpi, different numbers of sensor pixels are binned into each fourpi pixel, and this number changes for each pixel as the camera rotates and the image is stablised. This varying number of pixels in each fourpi pixel makes the method more advantageously when processing the spherical fourpi.

[0046] Provided the aforementioned advantages, an aspect of the present invention is a general process for depth estimation. The process involves defining a parameter representing a range of potential disparities, such as [-1 to 18] pixels. For every pixel in the input image, a dissimilarity score is computed by comparing the score with that of the corresponding pixel at each possible disparity in a matching image. This computation generates a cost volume with dimensions (Wx H x D). The cost volume is aggregated by analysing one-dimensional lines in various directions across the image, minimizing a specified energy function. The sub-pixel disparity level with the lowest cost is interpolated for each pixel to construct a 2D disparity map. Optionally, the disparity map is filtered using the original image as a guide to enhance edge preservation. The disparity map would be used to estimate the depth of the image pair. Exemplary implementation of this process is further described in the following sections.

[0047] Here, the input image may be a stable (or stably) preprocessed image. The stable preprocessed image(s) may either be a stereo image with left and right sides or a pair of two separate 2D images referring to a stereo image pair captured from slightly different viewpoints, simulating how human eyes perceive depth.

[0048] The stereo image may undergo stability-enhancing processing to produce a stable preprocessed image. Such processing may utilise one or more data structures/projections to maintain a consistent and rotationally stable field of view for said image or rotationally stable. This stabilization of said image can be accomplished using the method described in PCT/GB2022/051055 published as VV02022248824A1 and herein incorporated by reference. After obtaining the stable preprocessed image, information underlying said image may be extracted from the image to generate one or more disparity maps.

[0049] Disparity map herein refers to a two-dimensional representation of the estimated relative pixel difference caused by the depth between corresponding points in a pair of stereo images. Here, a semi-global block matching algorithm generates the disparity map by comparing pixel correspondences between the left and right images captured by one or more vertically aligned 360 cameras.

[0050] A cylindrical projection herein refers to a mapping or representation of a three-dimensional (3D) space onto a two-dimensional (2D) surface in a way that simulates the view from a cylindrical perspective. This projection displays 3D data or scenes on a flat surface while preserving some of the spatial relationships and characteristics of the original 3D space. The cylindrical projection may be a cross-section of a data structure such as HEALPix(Hierarchical Equal Area isoLatitude Pixelization))/fourpi as described herein.

[0051] Pixel intensity herein refers to a single pixel's brightness or grey level in a digital image. It is a numerical value that quantifies the amount of light at that specific pixel's location within the image. For example, in greyscale or black-and-white images, pixel intensity typically ranges from 0 (representing black or no light) to a maximum value (representing white or maximum light in 8-bit pixel representation). In colour images, pixel intensity values can vary across multiple channels (e.g., red, green, and blue in an RGB image), representing the intensity of each colour component at the pixel's location.

[0052] Pixel dissimilarity measure herein refers to a quantitative metric used to assess the difference or dissimilarity between two pixels in a digital image. It provides a numerical value that indicates how different the pixel values are from each other, often based on factors such as intensity, texture, or other image characteristics.

[0053] Cost function is an objective function or loss function that can be used to perform optimization and quantify how well a model's predictions or outputs match the true or desired values. It serves as a measure of the "cost" or "error" associated with the model's performance. Here, the cost function is minimised to obtain the disparity map.

[0054] Cost volume herein refers to a data structure that stores a collection of cost values representing the dissimilarity or cost of matching pixels or image patches between a pair of stereo images at different disparity levels.

[0055] Disparity value herein refers to a numerical measurement representing the horizontal or lateral shift between corresponding points or features in a pair of stereo images. It quantifies the apparent difference in the horizontal position of an object or point as seen from two different viewpoints. Disparity value may be represented in the disparity map.

[0056] Outliers introduce inaccuracies into the depth estimation or 3D reconstruction, hereby referring to pixels or regions within a disparity map that deviate significantly from the expected or typical disparity values of the scene. These are pixels or regions whose disparity values are noticeably different from the surrounding areas, often due to errors or challenges in the stereo matching process.

Detailed Implementation [0057] In one exemplary implementation corresponding to one or more aspects of the present invention, the input images are captured from the pair of ODK cameras via the text stream such as "/fourpi_see/ cylindrical_heading_aligned" stream to provide an aligned top/bottom pair or herein known as a stereo image pair. An example capture setup is shown specifically in Figure 1. A semi-global block matching (SGBM) algorithm may be used to produce disparity maps, and a weighted least squares disparity (WLS) filter may be applied to preserve edges present in the stereo image pair.

[0058] Semi-global block matching (SGBM) is an extension of the original semi-global matching (SGM) algorithm proposed H. Hirschmuller, in "Stereo processing by semiglobal matching and mutual information," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 328-341,2007. It operates in a pixel-wise fashion, matching image patches referred to as blocks between a pair of images with known epipolar geometry. Implementation of packages such as OpenCV requires that the images be a left-right rectified pair, i.e., The epipoles of the base image are rows of pixels in the matched image. The SGBM algorithm operates in steps 1 to 6: 1. Pre-processing; 2. Cost calculation; 3. Cost aggregation; 4. Disparity selection; 5. Left-right consistency checks; and 6. Post-processing. These steps are further description in the following sections.

[0059] For use with the ODK, the fourpi cylindrical heading aligned publishers are subscribed to and their messages are converted to NumPy arrays (implemented in C++/ros2), the transverse of the top-bottom image pair is taken to create an approximately rectified right-left pair which is then passed to the first stage of the algorithmic pipeline. Here, "cv::Mat" is executed to store/process image data and "sensor_msgs::msg::Image" is to receive / publish the data.

[0060] 1. Preprocessing: First, input images are converted to greyscale as the cost calculation is based only on pixel intensity. An XSobel filter is applied (although I cannot find any documentation regarding the parameters used). 2. Cost calculation: The original SGM paper recommends calculating cost based on mutual information (MI) where the MI between the base image (lb) and the matched image (1,,) is defined in terms of their individual and shared entropies, [0061] where H, is the information entropy of an image, and I-1,b,,r,, is the joint information entropy between the base and matching image. However, although this approach can lead to improved results due to insensitivity to changes in illumination, a simpler cost function is implemented in OpenCV's SGBM, a sub-pixel sampling intensive measure proposed S. Birchfield and C. Tomasi, "A pixel dissimilarity measure that is insensitive to image sampling," IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 4, pp. 401-406,1998. For a given row of pixels, Rb and Rm represent the one-dimensional arrays of pixel intensities in the base and matched images, respectively. It is the goal of the cost function to compute the dissimilarity between the pixel at position xb in Rb and at x," in R,". The intensity arrays are first linearly interpolated to create two new arrays notated and To see how well the intensity at xb matches the interpolated region around x," the following is computed, * [0062] The equivalent quantity can be determined symmetrically for Ef,using An, R.), The dissimilarity is then given by the minimum of the two values, = min [0063] The dissimilarity can be computed efficiently by first computing the linearly interpolated intensity between x," and (x-1)m, [0064] and the symmetric calculation for the interpolated intensity between x," and (x + 1)m, [0065] Letting, = min [0066] and, R 'ax = max [0067] allows for _ d to be calculated simply from, [0068] As before, the symmetric equation can be used to find 'IV- before the final dissimilarity is calculated by Equation (3). This calculation takes only a small constant time to compute than the minimum absolute intensity difference (empirically, approximately 10%). The dissimilarity function can be calculated for each pixel pair at each disparity level (set as a parameter of the algorithm) to give a matching cost at each disparity level, and the minimum cost implies a more likely match. Figure 7 shows a disparity map 701 found by taking the disparity minimum cost at each pixel from the input images given in Figure 6.

[0069] 3. Cost aggregation: Using the pixel-wise cost function alone is often ambiguous. Therefore, an energy function is defined over disparity image D, which penalises local disparity changes. The energy function E(D) is given by, E(D) = Yfci Dr, V' T Dq- == .1 -Dfi [0070] where the first term gives the sum of the cost function over the disparities, the second term adds a penalty given by PI for small disparities in the neighbouring pixels, and the third time gives a larger penalty P2 for larger disparities in the neighbouring pixels. The goal, then, is to find the disparity map D, which minimises E(D). Unfortunately, performing a 2D global minimisation is NP-complete for many energies. Instead, performing the minimisation along 1D image slices can be computed in polynomial time. Therefore, matching costs are aggregated for pixel p along radial lines in all directions equally. The aggregated cost is then given by the sum of all 1D minimum cost paths, which end at pixel p with disparity d. The cost Lr(p; d along the path in the direction r is defined recursively, ".) = C4, -4-u -rJ uttituti; [0071] This implements the behaviour of Equation (9) along an arbitrary 1D path. The final smoothed (aggregated) matching cost is given as the sum of costs along all lines r, i.e., S = P d) (11'1 ) [0072] A suggested implementation from the original SGM paper is as follows: 1) Pre-calculate the matching cost at each pixel and disparity, stored as an 11-bit value in an array of size [W X H X D] where Wand H are the image width and height respectively and D is the number of disparities considered; 2) Create a second array of the same size, initialised at zeros, to store the aggregated costs; 3) For each direction r begin with the image border pixels, where Lr(p, d) = C(p, d) then traverse the path in the forward direction according to Equation (10) incrementing the aggregate array by the calculated energy path value each time.

[0073] The original SGM paper recommends using 16 directions or a minimum of 8 directions, the OpenCV implementation uses 4 directions by default, but this number can be changed as a parameter of the algorithm. Each pixel is visited r times, leading to 0 (W H D) time complexity. The regular data structure and simple arithmetic operations allow for highly parallel computation. Figure 7 shows the disparity map 703 after the cost volume used to produce disparity map 701 found from the initial cost volume calculation has been aggregated.

[0074] 4. Disparity selection: Selecting the disparity image arising from lb can be found by selecting, for each pixel, the disparity of minimum cost. If sub-pixel disparity estimation is required, the maximum of a curve fit in disparity space can be used. It is common to use a quadratic for simple computation, although there is no reason an alternate curve such as a Gaussian could not be used. To calculate the disparity image arising from In, the same costs can be used by traversing the epipolar line corresponding to each pixel in the matched image, again selecting the disparity with minimum cost. However, better results can be achieved by calculating Dm from scratch, i.e., using In, and the base image and lb as the matched image, it is application dependent on whether the extra computational cost is worth increased confidence in the resultant disparity maps. Finally, outliers are filtered from both Db and Dm using a median filter with a 3 x 3 kernel.

[0075] 5. Left-right consistency: Each disparity in Db is compared to the corresponding disparity in Dm and checked for equality. A uniqueness constraint is also applied during the check, ensuring the disparity map only contains one-to-one mappings.

[0076] 6. Post-processing: Areas of low texture can lead to outliers, which often manifest as small patches of large disparity compared to surrounding pixels. To remove these the image can be segmented into areas that vary by more than a pixel, if these areas are very small, they can be removed. The window size at which peaks are filtered is set as a parameter of the algorithm.

Weighted least squares filtering [0077] To improve the disparity map Db, the original base image lb can be used as a guidance image in a WLS energy minimisation problem described by the energy function J(Ub) where Ub is the desired output disparity map as follows, )] [0078] where A is a parameter controlling the influence of the guidance image, large values of A lead to high smoothing. Typically A is around 8000. Here the function luP.gljb) is a spatially varying weighting function defined by, = [0079] where is a parameter typically set in the range l()'8 "' < 1.7* By setting the gradient of Equation (12) to zero the Ub which minimises J(Ub) can be found by solving the linear system, (1. 4-AA) [0080] where I is the identity matrix and A is a (Wx H)2 sparse Laplacian matrix defined by, = [0081] where m, n are indices corresponding to each pixel. Therefore, the final smoothing function can be calculated by rearranging Equation (14) to, = (0.± A) [0082] 1) OpenCV Implementation: In OpenCV, the WLS is implemented as described in Min et al, "Fast global image smoothing based on weighted least squares," IEEE Trans. Image Process., vol. 23, no. 12, pp. 5638-5653,2014, where the function is decomposed into each spatial dimension allowing the matrix to be solved with a combination of 1-D fast solvers. Taking some one-dimensional row from the input disparity map Dh with pixels itA3 and the equivalent row from the guide imageN, with pixels 1 the linear system given in Equation (14) can be rewritten for the one-dimensional solution as, = (Fr [0083] where Uh is a horizontal row from the smoothed output disparity map, Ibis a (W x VV) identity matrix and Ah is a three-point Laplacian matrix. Defining the neighbours of some pixel x to be only the pixels immediately to the left and right x',11) allows Equation (17) to be written in terms of a tridiagonal matrix as, [0084] where (rt, - are the boundary conditions, and ax, bx, cx represent the non-zero elements of the kb row of (lh + AP,h) given by, = + A [0085] With these definitions, the matrix can be solved efficiently using Gaussian elimination in order 0(N) time where N = W. The one-dimensional solver is conducted for each row and each column. This process is iterated to prevent streaking artefacts by adjusting the value of A each time. With each iteration, a horizontal and vertical pass is conducted before updating A according to, [0086] where t is the current iteration and T is the total number of iterations. Figure 7 shows the result of applying WLS filtering 705 to the disparity map 703 using the top image of Figure 6 as the guide image.

Depth calculation [0087] Calculating depth from a disparity map is relatively simple. Consider the diagram shown in Figure 8. On the left are two similar triangles, one with a base equal to the stereo baseline b (the distance between the two camera's principal points as shown in Figure 1). And the other with a base equal to the baseline minus the pixel disparity If the baseline distance is known, the depth z can therefore be calculated from, (20) [0088] Applying this depth calculation to the disparity map 705 from Figure 7 based on Equation (20) results in the depth map shown in an updated map 707 at the bottom of the figure.

[0089] Preferably, the depth calculation performed using the known pixel->unit vector mapping benefited by the fourpi data structure is also shown in Figure 8. With this method, pixel (and sub-pixel) coordinates in the fourpi can be transformed into a unit vector that originates from the fourpi centre pointing to that location on the fourpi surface. These unit vectors can be represented in polar coordinates, providing a pair of angles that represent the azimuth and elevation of a given pixel on the spherical surface. Triangulation can be conducted by casting rays from the origin of each fourpi, along the direction defined by these unit vectors and calculating where the rays intersect. Here, the depth is given relative to the top fourpi, but this can be changed (e.g., to the centre of the baseline) depending on the application. This method is advantageous as it does not require knowledge of the metric size of each pixel, which can vary due to the pixel binning algorithm used when constructing the fourpi data structure; this has advantages in algorithm robustness and implementation complexity. By removing the reliance on camera calibration and the need for metric size of the pixels, the method streamlines the computation, working synergistically with the spherical fourpi data structure.

[0090] Figure 1 is a schematic diagram showing epipolar geometry of a convergent stereo pair. The figure presents how a single pixel in the base image can be projected onto the second image as a line through a stereo camera setup comprising two 360-degree cameras in a vertical alignment, effectively reducing the search space for correspondences from 2D to 1D as explained in the above sections. Also shown in the figure are the stereo baseline 105 and epipolar line 107 for the purpose of conducting the calculations as described below for generating the disparity map according to one or more aspects of the present invention.

[0091] In the stereo camera setup as shown, each 360-degree camera 101, 103 contains two sensors and two lenses each covering half of the field of view, each sensor and lens pair could be characterised using, i.e., the Brown-Conraday model, which describes the intrinsic properties of the camera and the distortion introduced by the lens. The pixel data from each sensor can be stabilised and aligned to produce a stable preprocessed image(s). The pixel data may be compiled into a 360-degree representation(s) through a hardware implementation in real-time. The 360 representation(s) may comprise a cylindrical data structure.

[0092] Based on this camera step, it is plausible to characterise and correct lens vignetting effects, utilising a specialised lightbox and implementing a curve fitting algorithm. This eliminates any undesired artifacts that tend to appear near the periphery of the lens. These artifacts can be notably pronounced if not properly addressed, and our approach effectively mitigates their impact on the processed image. For example, the data may be compiled into Opteran fourpi data structure, which is a derivative of the HEALPix data format. The process is not dependent on the pixel data structure used. However, the fourpi data structure has some useful properties. Fourpis are a representation of 360-degree image data where each pixel covers an equal area on a sphere, this is shown in Figure 3. Such representation provides the benefits of high speed and low-cost visual only computation.

[0093] Figure 2 is a flow diagram of the depth estimation process 200 according to aspects of the disclosure. Process 200 pertains to image depth estimation. It involves acquiring rotationally stabilised preprocessed images from ODK cameras and using the images to generate a disparity map shown in Figure 7 for deriving depth measurements, as further described in the following steps.

[0094] In step 201, one or more stable preprocessed images are acquired based on stereo images captured by ODK cameras, as shown in Figure 1. Said one or more stable preprocessed images are rotationally stabilised from input stereo images (pair) through a process described herein.

[0095] An example of obtaining these stable preprocessed images involves transforming the images captured by ODK cameras into input images arranged in a cylindrical data structure, where the cylindrical data structure is a cylindrical projection derived from a data structure of raw images. The structure is rotationally stable.

[0096] During this transformation, the pixel intensity of the input images can be modified and filtered to achieve rotational stability. An exemplary embodiment involves receiving one or more input images of a cylindrical data structure, reducing the pixel intensity of said one or more input images, and filtering said one or more input images to obtain said one or more stable preprocessed images used in the following steps. Said one or more one or more input images may be captured by one or more vertically aligned 360-degree cameras.

[0097] Optionally, disparities consistency is compared between the left and right of said one or more stable preprocessed images. This provides improved accuracy when generating the disparity map.

[0098] In step 203, at least one disparity map is generated based on said one or more stable preprocessed images. Said at least one disparity map may be generated by applying a semi-global block matching algorithm to said one or more stable preprocessed images.

[0099] The semi-global block matching algorithm may comprise a cost function based on at least one pixel dissimilarity measure that is insensitive to image sampling or changes in image resolution. The cost function may be configured to generate the disparity map based on a calculated cost volume of said one or more stable preprocessed images.

[00100] An example of at least one disparity map generated using the semi-global block matching algorithm incorporates a cost function, which relies on a pixel dissimilarity measure that remains unaffected by image sampling or alterations in image resolution. This cost function is designed to produce the disparity map by utilising a computed cost volume derived from the set of stable preprocessed images.

[00101] In another example, to generate said at least one disparity map, process 200 further comprising: computing a cost at each pixel of said one or more stable preprocessed images; calculating a cost volume based on the computed costs; identifying at least one disparity value with minimal cost based on the cost volume; and selecting a disparity value with lowest cost for each pixel.

[00102] In step 205, filtering at least one disparity map to preserve edges present in the stereo image pair. At least one disparity map may be filtered by applying a weighted least squares disparity filter to said at least one disparity map. The filtering process may be done based on comparing with the original image.

[00103] An example of filtering may employ any weighted least squares disparity (VVLS) as described herein, serving as the filter such that the ODK may generate filtered disparity maps at around 90 frames per second while running on a single CPU.

[00104] In another example, any outliers present in the generated disparity map are detected and subsequently eliminated. This removal process may involve applying a segmentation method, as described herein, to segment the image into distinct areas where variations exceeding one pixel for instance are identified and removed.

[00105] In step 207, depth measurements for said one or more stable preprocessed images are calculated based on the filtered disparity map. For example, the depth may be calculated method described according to Figure 8 using the disparity map from shown in Figure 7.

[00106] Calculation can be adapted for processing the spherical fourpi representation/data structure, where the number of pixels in the data structure varies. In this context, varying numbers of sensor pixels are grouped into each fourpi pixel, and this count changes as the camera rotates and the image stabilises. The varying number of pixels in each fourpi pixel makes the calculation for depth measurements more efficient or advantageous.

[00107] The depth measurements are outputted and are to be used in a robotic or other practical application. For example, the output may be used to guide a device/robot traversing an environment or moving in a simulated environment, where the direction of movement for the device is readily determined based on the depth estimation in real-time.

[00108] Figure 3 is a pictorial diagram showing HEALPix/fourpi data structures at different resolutions. HEALPix/Fourpi data structures, as shown are grid-based representations of spherical data. These data structures can be stored efficiently and used for analysing information over the entire representation. The data structures are hierarchical, referring to their multi-resolution nature, where the sphere is recursively subdivided into smaller pixels. It has equal area, which ensures that each pixel covers an equal area on the sphere. Also, it remains isoatitude such that the grid is designed so that lines of constant latitude cross a consistent number of pixels at each level of the hierarchy. In relation to the HEALPix, fourpi refers to the entire sphere (4-rr steradians) that is covered by this pixelizafion scheme. Fourpi offers a way to represent and analyse spherical data comprehensively.

[00109] HEALPix/fourpi herein refers to Opteran fourpi data structure, which is particularly useful for handling data on the sphere because it provides an equal-area partitioning, which is important for statistical analyses in the present application. It also supports fast and efficient operations for transforming data between pixel space and spherical harmonic space, which is essential for various types of spherical data analysis.

[00110] Figure 4 is a pictorial diagram of cylindrical bands within fourpis data structure. Here, 2D projections can be taken, various projections are implemented in the opteran_fourpi_tools library. "Band_0" 401, "band_1" 403, and "band_2" 405 are present. Particularly useful to stereo vision are the three cylindrical bands of pixels about each axis; these pixel bands are shown. A cylindrical image projection can be extracted from the central bands of both fourpi pixel sets as shown by "band_0" 401.

[00111] Because the image data are prealigned before being stored in the fourpi, both cylinders are guaranteed to have parallel axes no matter the pose of each stereo camera. Therefore, if the cameras are perfectly vertically aligned, the vertical bands of the fourpi data will be co-cylindrical. When the cylindrical data is unwrapped into a rectangular image the resulting images are therefore guaranteed to be coplanar. The process of unwrapping the central bands for stereo computation is visualised. From these unwrapped images, disparity can be computed without the need for rectification of the input images using standard disparity algorithms. The resulting disparity map represents 360 degrees about the system's central axis and 90 degrees about the equator. It would be possible to also create a system with a horizontal arrangement however this would produce disparity maps with the field of view captured by "band_2" 503 so the vertical arrangement is likely to be the most useful. The system handles small misalignments of the camera's vertical axes well, due to the low resolution of the imaging systems the change in the image data is small and standard disparity algorithms can compensate for small misalignments.

Using inertial measurement unit data the change in vertical stereo baseline can also be compensated for. However, as the misalignment increases, at around 45 degrees the results are noticeably affected before becoming meaningless at 90 degrees.

[00112] Figure 5 is a schematic diagram of extracting pre-aligned images from HEALPi)dfourpi data via a cylindrical projection. Starting from 360 stabilised stereo fourpi data structures 501 that are captured by ODK cameras and processed based on their stereo baseline distance, which allows the image to be rotationally stable, and referred to as stable preprocessed images. Cylindrical projection/data structure 503 is derivable from the fourpi data structure in relation to the shared axis to disparity maps 505 as shown.

[00113] From the produced disparity maps 505, depth estimation can be performed through triangulation, using the known stereo baseline, characterised camera parameters and the known pixel size. A benefit of using the fourpi data structure during the calculation is that each pixel is equal area on the projective image and as such the triangulation calculation is simple and does not need to account for higher densities of pixels near the poles.

[00114] An exemplary implementation of this approach could produce disparity maps of 256x64 pixels using OpenCV disparity calculation on unrectified images, operating on a single core of an Intel i7-12700H at 90 frames per second, which is equal to the framerate aligned image data received from the camera. At a stereo baseline of 20 cm the resulting depth maps, calculated via triangulation can resolve objects at distances within 10 m, although depth uncertainty increases with absolute distance.

[00115] Figure 6 is a pictorial diagram of an aligned stereo image pair. Images (top and bottom) are captured from the pair of ODK cameras and aligned to orientation. Images shown are used to produce disparity maps shown in Figure 7 [00116] Figure 7 is a pictorial diagram of disparity maps. The top disparity map 701 shown is derived from taking the disparity minimum cost at each pixel from the input images given in Figure 6. The second disparity map 703 from the top is derived from the top disparity map 701, after the cost volume used and has been aggregated.

The third disparity map 705 from the top is derived using VVLS filtering applied to the second disparity map 705 using the top disparity map 701 as a guide. The final or last shows the depth map, a depiction of the depth estimation, calculated by applying Equation (20) to the third disparity map 705.

[00117] Figure 8 is a schematic diagram of relationship between disparity and depth. Based on this relationship, depth can be estimated using either the first method 801, or preferably the second method 803. It is appreciated that the same depth calculation either applying the first method 803 or the second method 803 can be applied to the disparity maps in Figure 6, yielding the depth map shown in Figure 7.

[00118] In the first method 801, on the left schematic, two similar triangles can be constructed: one with a base equal to the stereo baseline (b, the distance between the two camera's principal points), and the other with a base equal to the baseline minus the pixel disparity (x -x') not shown. Given the known baseline distance, depth (z) can be calculated using Equation (20).

[00119] The second method 803, on the right schematic, presents the triangulation problem. Shown are two fourpi data structures as fourpi_1 and fourpi_2. The angles marked 6 (Theta) and cp (phi) can be calculated directly from the elevation angles because the cylindrical bands are vertical aligned. Theta is directly the elevation of the pixel in the base fourpi, while phi is 180deg minus the elevation of the pixel in the base image minus the disparity. Assuming the baseline distance b is known the depth relative to the top fourpi can be calculated using the sin rule as sin (0)* b sin (0 + 4)). A disparity of zero leads to parallel rays, meaning the depth is greater than the maximum resolvable by the optical system.

[00120] Figure 9 is a pictorial diagram of stereo variable baseline ODK mount.

The depth estimation pipeline is tested in a variety of environments. First the baseline is varied and the impact on the sensing range is determined. A mount was additively manufactured which can hold two ODK cameras at variable distances from each other. The mount is shown in the minimum baseline configuration 901 on the left and at the maximum baseline configuration 903 on the right. Using this mount, the minimum and maximum distances at which an object can be detected were determined. An empty monitor box was pushed toward the sensor until the disparity exceeded the maximum detectable disparity (set to 14 pixels). The box was then pushed away from the sensor until it could no longer be distinguished from the background. This process was repeated three times.

[00121] Alternatively, we may employ a fixed baseline with 4 cameras integrated into a single unit. In this setup, the computations are performed directly on the device itself, albeit at a reduced framerate of approximately 20 frames per second. This effectively improves the results as such.

[00122] Figure 10 is a pictorial diagram of experimental baseline results in a graph. It shows the mean and range minimum and maximum distances at which meaningful depth data can be extracted. As can be seen, at the minimum baseline of 17.5 mm the algorithm could detect objects between 5 cm and 46 cm. At the largest baseline of 120 mm, it could distinguish an object between 60.5 cm and 4.15 m. The current design of the ODK3 has a baseline distance of approximately 90 mm. According to the results shown, this would lead to a sensing range of approximately 450 mm to 3200 mm. It should be noted that the cameras used on ODK3 are different, so results may vary.

[00123] From the baseline study summarised the figure, it is clear that as the baseline increases, the maximum distinguishable distance also increases. However, this comes at the cost of an increased minimum distinguishable distance. At the maximum baseline of 120 mm, the maximum distinguishable distance appears converging. This is because at this distance the size of the object used is approaching the pixel size. Testing on larger objects, the maximum sensing range is found to be approximately around 6 m at a large baseline. Additionally in Figure 11 (outdoor 1105), it can be seen that the estimation of the maximum distance is much less consistent than the estimation of the minimum distance as quantified by the wider error bars. This is because the disparity-depth relationship is not linear, and low disparities can represent a large range of depths creating a fall off in depth resolution at low disparities / high depths. Due to the lack of certainty/noise it is difficult to assess exactly when the object ceases to be visible.

[00124] Herein described depth algorithm, as part of the present invention, produces useable depth maps on hardware with various specifications. For example, when deployed on the CPU of the Dell G15 laptop, the algorithm can operate at around 150 frames per second. The depth algorithm is applicable to run on hardware with more limited resources. Memory efficiency and further computational simplifications may adapted.

[00125] The depth estimation is reliant on the alignment and synchronisation of the video feeds provided by the ODKs, and this will be easier to guarantee when the cameras are controlled by a single ODK, which will be the case with ODK3.

[00126] The amount, frequency, size, and distribution of errors have not yet been possible to quantify. This is due to the lack of ground-truth data to compare the predicted depth maps. Work has thus been undertaken to implement the stereo system within the Opteran ODK Simulator, the results which may help reduce the errors validating against results from LiDAR and intel real sense system. This will provide simulated image pairs alongside a ground truth depth map. An error map can be calculated by performing the proposed depth estimation pipeline on the image pair and comparing it to the ground truth data. From this error map, quantifications about error and error sources can be made.

[00127] Figure 11 is a pictorial diagram of results produced in outdoor 1105 and indoor 1101, 1103 environments and the performance of stereo depth algorithm therefrom. In an outdoor 1105 environment, for example to gain a sense of at what distance a vehicle would be sensed, the data summarised outdoor 1105 were collected. On the right of the figure, it can be seen that the maximum range at which a vehicle can be distinguished from the background is around 6 m. For autonomous vehicle applications, it may be desirable to filter out sensing the road surface.

[00128] Another area of interest is the ability to sense plain walls in indoor 1101, 1103 environments. The right side of the figure shows the stereo sensor held close to two examples of white painted walls. As can be seen, the algorithm can indeed detect these walls. This is achieved by detecting some disparity then inferring shared depth via the VVLS filter. If the wall was entirely blank the depth estimation may be lower quality, however it is rare that there would be no feature to match and infer a shared depth from, such as the plug shown at left-top 1101 and the light switch shown at left-bottom 1103 with respect to the walls.

[00129] Figure 12 is a pictorial diagram of initial extraction of estimated and ground truth depth, which is operated using the depth algorithm running on a Dell G15 laptop CPU at around 150 frames per second. Depth estimation relies on aligning and synchronizing video feeds from the ODKs, which will be easier with ODK3 since it controls all the cameras. A simulator is being used to generate simulated image pairs with known ground truth depth maps. This will help calculate error maps and analyse error sources.

[00130] Figure 13 is a block diagram illustrating an example computing apparatus/system 1300 that may be used to implement one or more aspects of the present invention, apparatus, method(s), and/or process(es) combinations thereof, modifications thereof, and/or as described with reference to figures 1 to 12 and/or aspects as described herein. Computing apparatus/system 1300 includes one or more processor unit(s) 1302, an input/output unit 1304, communications unit/interface 1306, a memory unit 1308 in which the one or more processor unit(s) 1302 are connected to the input/output unit 1304, communications unit/interface 1306, and the memory unit 1308. In some embodiments, the computing apparatus/system 1300 may be a server, or one or more servers networked together. In some embodiments, the computing apparatus/system 1300 may be a computer or supercomputer/processing facility or hardware/software suitable for processing or performing the one or more aspects of the system(s), apparatus, method(s), and/or process(es) combinations thereof, modifications thereof, and/or as described with reference to figures 1 to 12 and/or aspects as described herein. The communications interface 1306 may connect the computing apparatus/system 1300, via a communication network, with one or more services, devices, the server system(s), cloud-based platforms, systems for implementing subject-matter databases and/or knowledge graphs for implementing the invention as described herein. The memory unit 1308 may store one or more program instructions, code or components such as, by way of example only but not limited to, an operating system and/or code/component(s) associated with the process(es)/method(s) as described with reference to figures 1 to 12, additional data, applications, application firmware/software and/or further program instructions, code and/or components associated with implementing the functionality and/or one or more function(s) or functionality associated with one or more of the method(s) and/or process(es) of the device, service and/or server(s) hosting the process(es)/method(s)/system(s), apparatus, mechanisms and/or system(s)/platforms/architectures for implementing the invention as described herein, combinations thereof, modifications thereof, and/or as described with reference to at least one of the figure(s) 1 to 12.

[00131] In the context of the preceding figures 1 to 13, the present invention and the disclosures herein encompass one or more aspects. It should be noted that these aspects can be combined, as appropriate, with other aspects and/or options described herein.

[00132] In one aspect is a computer-implemented method for image depth estimation, the method comprising: obtaining one or more stable preprocessed images, wherein said one or more stable preprocessed images are rotationally stabilised; generating at least one disparity map based on said one or more stable preprocessed images; filtering said at least one disparity map; calculating depth measurements for said one or more stable preprocessed images based on the filtered disparity map; and outputting the depth measurements.

[00133] As an option, further comprising: receiving one or more input images of a cylindrical data structure; reducing the pixel intensity of said one or more input images; and filtering said one or more input images to obtain said one or more stable preprocessed images.

[00134] In another aspect is an apparatus for processing a stereo image pair, the apparatus comprising: at least two cameras for capturing the stereo image pair; and one or more modules configured to: convert said one more images to one or more stable preprocessed images that are rotationally stable, generate at least one disparity map based on said one or more stable preprocessed images, filter said at least one disparity map, calculating depth measurements for said one or more stable preprocessed images based on the filtered disparity map, and output the depth measurements for the stereo image pair.

[00135] In yet another aspect is a system for estimating depth of a stereo image pair, the system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform any method according to the first aspect. In yet another aspect is a computer-readable medium storing instructions that, when executed by a processing device, performs operations according to at least one previous aspect.

[00136] Optionally, the cylindrical data structure is a cylindrical projection derived from a data structure of raw images captured by one or more vertically aligned 360-degree cameras. As a further option, said generating at least one disparity map based on said one or more stable preprocessed images, further comprising: applying a semi-global block matching algorithm to said one or more stable preprocessed images. As a further option, the semi-global block matching algorithm comprises a cost function based on at least one pixel dissimilarity measure that is insensitive to image sampling or changes in image resolution. As a further option, the cost function is configured to generate the disparity map based on a calculated cost volume of said one or more stable preprocessed images. As a further option, said generating at least one disparity map based on said one or more stable preprocessed images, further comprising: computing a cost at each pixel of said one or more stable preprocessed images; calculating a cost volume based on based on the computed costs; identifying at least one disparity value with minimal cost based on the cost volume; and selecting a disparity value with lowest cost for each pixel. As a further option, further comprising: comparing whether disparities are consistent between left and right of said one or more stable preprocessed images. As a further option, furthering comprising: identifying outliers in said at least one disparity map; and removing said outliers by segmentation. As a further option, said filtering said at least one disparity map, further comprising: applying a weighted least squares disparity filter to said at least one disparity map. As a further option, using depth estimation provided to guide a device traversing an environment or move in a simulated environment, the system comprising: one or more processors configured to determine the direction of movement and move the device based on the depth estimation.

[00137] In the embodiments, examples, and aspect of the invention as described above such as the predictive model for pathogenicity assessment process(es), method(s), system(s) and/or apparatus may be implemented on and/or comprise one or more cloud platforms, one or more server(s) or computing system(s) or device(s). A server may comprise a single server or network of servers, the cloud platform may include a plurality of servers or network of servers. In some examples the functionality of the server and/or cloud platform may be provided by a network of servers distributed across a geographical area, such as a worldwide distributed network of servers, and a user may be connected to an appropriate one of the network of servers based upon a user location and the like.

[00138] The above description discusses embodiments and aspects of the invention with reference to a single user for clarity. It will be understood that in practice the system may be shared by a plurality of users, and possibly by a very large number of users simultaneously.

[00139] The embodiments and aspects described above may be configured to be semi-automatic and/or are configured to be fully automatic. In some examples a user or operator of the querying system(s)/process(es)/method(s) may manually instruct some steps of the process(es)/method(es) to be carried out.

[00140] The described embodiments of the invention a system, process(es), method(s) and/or tool for querying a graph data structure and the like according to the invention and/or as herein described may be implemented as any form of a computing and/or electronic device. Such a device may comprise one or more processors which may be microprocessors, controllers or any other suitable type of processors for processing computer executable instructions to control the operation of the device in order to gather and record routing information. In some examples, for example where a system on a chip architecture is used, the processors may include one or more fixed function blocks (also referred to as accelerators) which implement a part of the process/method in hardware (rather than software or firmware). Platform software comprising an operating system or any other suitable platform software may be provided at the computing-based device to enable application software to be executed on the device.

[00141] Various functions described herein can be implemented in hardware, software, or any combination thereof. If implemented in software, the functions can be stored on or transmitted over as one or more instructions or code on a computer-readable medium or non-transitory computer-readable medium. Computer-readable media may include, for example, computer-readable storage media. Computer-readable storage media may include volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. A computer-readable storage media can be any available storage media that may be accessed by a computer. By way of example, and not limitation, such computer-readable storage media may comprise RAM, ROM, EEPROM, flash memory or other memory devices, CD-ROM or other optical disc storage, magnetic disc storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disc and disk, as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc (BD). Further, a propagated signal is not included within the scope of computer-readable storage media. Computer-readable media also includes communication media including any medium that facilitates transfer of a computer program from one place to another. A connection or coupling, for instance, can be a communication medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of communication medium. Combinations of the above should also be included within the scope of computer-readable media.

[00142] Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, hardware logic components that can be used may include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs). Complex Programmable Logic Devices (CPLDs), etc. [00143] Although illustrated as a single system, it is to be understood that the computing device may be a distributed system. Thus, for instance, several devices may be in communication by way of a network connection and may collectively perform tasks described as being performed by the computing device.

[00144] Although illustrated as a local device it will be appreciated that the computing device may be located remotely and accessed via a network or other communication link (for example using a communication interface).

[00145] The term 'computer is used herein to refer to any device with processing capability such that it can execute instructions. Those skilled in the art will realise that such processing capabilities are incorporated into many different devices and therefore the term 'computer includes PCs, servers, loT devices, mobile telephones, personal digital assistants and many other devices.

[00146] Those skilled in the art will realise that storage devices utilised to store program instructions can be distributed across a network. For example, a remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realise that by utilising conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.

[00147] It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages Variants should be considered to be included into the scope of the invention.

[00148] Any reference to 'an' item refers to one or more of those items. The term 'comprising' is used herein to mean including the method steps or elements identified, but that such steps or elements do not comprise an exclusive list and a method or apparatus may contain additional steps or elements.

[00149] As used herein, the terms component" and "system" are intended to encompass computer-readable data storage that is configured with computer-executable instructions that cause certain functionality to be performed when executed by a processor. The computer-executable instructions may include a routine, a function, or the like. It is also to be understood that a component or system may be localised on a single device or distributed across several devices. Further, as used herein, the term "exemplary", "example" or "embodiment" is intended to mean "serving as an illustration or example of something". Further, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.

[00150] The figures illustrate exemplary methods. While the methods are shown and described as being a series of acts that are performed in a particular sequence, it is to be understood and appreciated that the methods are not limited by the order of the sequence. For example, some acts can occur in a different order than what is described herein. In addition, an act can occur concurrently with another act. Further, in some instances, not all acts may be required to implement a method described herein.

[00151] Moreover, the acts described herein may comprise computer-executable instructions that can be implemented by one or more processors and/or stored on a computer-readable medium or media. The computer-executable instructions can include routines, sub-routines, programs, threads of execution, and/or the like. Still further, results of acts of the methods can be stored in a computer-readable medium, displayed on a display device, and/or the like.

[00152] The order of the steps of the methods described herein is exemplary, but the steps may be carried out in any suitable order, or simultaneously where appropriate. Additionally, steps may be added or substituted in, or individual steps may be deleted from any of the methods without departing from the scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought.

[00153] It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art.

[00154] What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable modification and alteration of the above devices or methods for purposes of describing the aforementioned aspects, but one of ordinary skill in the art can recognise that many further modifications and permutations of various aspects are possible. Accordingly, the described aspects are intended to embrace all such alterations, modifications, and variations that fall within the scope of the appended claims.

Claims

CLAIMS1. A computer-implemented method for image depth estimation, the method comprising: obtaining one or more stable preprocessed images, wherein said one or more stable preprocessed images are rotationally stabilised; generating at least one disparity map based on said one or more stable preprocessed images; filtering said at least one disparity map; calculating depth measurements for said one or more stable preprocessed images based on the filtered disparity map; and outputting the depth measurements.
2 The method of claim 1, further comprising: receiving one or more input images of a cylindrical data structure; reducing the pixel intensity of said one or more input images; and filtering said one or more input images to obtain said one or more stable preprocessed images.
3. The method of claim 2, wherein the cylindrical data structure is a cylindrical projection derived from a data structure of raw images captured by one or more vertically aligned 360-degree cameras.
4. The method of any preceding claims, wherein said generating at least one disparity map based on said one or more stable preprocessed images, further comprising: applying a semi-global block matching algorithm to said one or more stable preprocessed images.
5. The method of claim 4, wherein the semi-global block matching algorithm comprises a cost function based on at least one pixel dissimilarity measure that is insensitive to image sampling or changes in image resolution.
6. The method of claim 5, wherein the cost function is configured to generate the disparity map based on a calculated cost volume of said one or more stable preprocessed images.
7. The method of any preceding claims, wherein said generating at least one disparity map based on said one or more stable preprocessed images, further comprising: computing a cost at each pixel of said one or more stable preprocessed images; calculating a cost volume based on the computed costs; identifying at least one disparity value with minimal cost based on the cost volume; and selecting a disparity value with lowest cost for each pixel.
8. The method of any preceding claims, further comprising: comparing whether disparities are consistent between left and right of said one or more stable preprocessed images.
9. The method of any preceding claims, furthering comprising: identifying outliers in said at least one disparity map; and removing said outliers by segmentation.
10. The method of any preceding claims, wherein said filtering said at least one disparity map, further comprising: applying a weighted least squares disparity filter to said at least one disparity map.
11. An apparatus for processing a stereo image pair, the apparatus comprising: at least two cameras for capturing the stereo image pair; and one or more modules configured to: convert said one more images to one or more stable preprocessed images that are rotationally stable, generate at least one disparity map based on said one or more stable preprocessed images, filter said at least one disparity map, calculating depth measurements for said one or more stable preprocessed images based on the filtered disparity map, and output the depth measurements for the stereo image pair.
12. The apparatus of claim 11, wherein said one or more modules are further configured to perform methods according to any of the claims 2 to 10.
13. A system for estimating depth of a stereo image pair, the system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform any method of claims 1 to 10.
14. A computer-readable medium storing instructions that, when executed by a processing device, performs operations according to method claims 1 to 10.
15. A system using depth estimation provided according to methods of claims 1 to 10 in order to guide a device traversing an environment or move in a simulated environment, the system comprising: one or more processors configured to determine the direction of movement and move the device based on the depth estimation.