CN111491154A

CN111491154A - Detection and ranging based on one or more monoscopic frames

Info

Publication number: CN111491154A
Application number: CN201910328119.4A
Authority: CN
Inventors: 贝赫鲁兹·马利基; 萨尔文纳兹·萨克霍什
Original assignee: Bitanimate Inc
Current assignee: Bitanimate Inc
Priority date: 2019-01-25
Filing date: 2019-04-23
Publication date: 2020-08-04
Also published as: JP2022518532A; WO2020231484A2; EP4229602A2; WO2020231484A3

Abstract

A method may include obtaining, via a detection application, a first digital image, where the first digital image may include a monoscopic image depicting a scene from a first location of a camera sensor communicatively coupled to the detection application. Additionally, the method may include: based on the first digital image, a second digital image of the monoscopic field of view and depicting the scene from a second location different from the first location is generated. Further, the method may include generating a stereoscopic image of a scene including the first digital image and the second digital image.

Description

Detection and ranging based on one or more monoscopic frames

Technical Field

Embodiments discussed herein relate to detection and ranging based on one or more monoscopic frames.

Background

The demand for detection and ranging applications has increased with the advent of autonomous and semi-autonomous vehicles. To help facilitate autonomous and semi-autonomous operation of vehicles, the ability to detect and range objects in the environment is becoming increasingly useful.

The claimed subject matter is not limited to embodiments that solve any disadvantages or that operate only in the environments described above. Rather, this background is intended to be merely illustrative of one exemplary technology in which some embodiments of the invention described herein may be practiced.

Disclosure of Invention

Embodiments of the present invention discuss detection and ranging based on one or more monoscopic frames, including systems and methods related thereto. A method may include obtaining, via a detection application, a first digital image, where the first digital image may include a monoscopic image depicting a scene from a first location of a camera sensor communicatively coupled to the detection application. Additionally, the method may include: based on the first digital image, a second digital image of the monoscopic field of view and depicting the scene from a second location different from the first location is generated. Further, the method may include generating a stereoscopic image of a scene including the first digital image and the second digital image.

The objects and advantages of the embodiments will be realized and attained by at least the elements, features, and combinations particularly pointed out in the claims.

The foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

Example embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1A illustrates an example system configured to generate stereoscopic (3D) images;

FIG. 1B illustrates an example environment in which generating stereoscopic images based on one or more monoscopic frames occurs;

FIG. 1C shows an example embodiment in which a camera may capture a first digital image at a first location and a second digital image at a second location;

FIG. 1D shows an example embodiment in which a camera may capture a first digital image in a first rotational position and a second digital image in a second rotational position;

FIG. 2 illustrates an example flow diagram of a method for mapping and/or detecting and ranging objects based on one or more single-field-of-view frames;

FIG. 3 illustrates an example system that may be used for detection and ranging of objects based on one or more single-field-of-view frames;

FIG. 4 illustrates an example depth map;

FIG. 5 shows an example stereo pair of images;

FIG. 6A illustrates an example stereoscopic depiction of a scene displayed on a display screen and corresponding to a first example stereoscopic image;

FIG. 6B illustrates an example stereoscopic depiction of a scene displayed on a display screen and corresponding to a second example stereoscopic image;

FIG. 6C illustrates an example stereoscopic depiction of a scene displayed on a display screen and corresponding to a third example stereoscopic image;

FIG. 6D illustrates an example stereoscopic depiction of a scene displayed on a display screen and corresponding to a fourth example stereoscopic image;

FIG. 6E illustrates an example stereoscopic depiction of a scene displayed on a display screen and corresponding to a fifth example stereoscopic image;

FIG. 6F illustrates an example stereoscopic depiction of a scene displayed on a display screen and corresponding to a sixth example stereoscopic image;

FIG. 6G illustrates an example stereoscopic depiction of a scene displayed on a display screen and corresponding to a seventh example stereoscopic image;

fig. 7 illustrates an example embodiment of a visual depiction of a screen with respect to a user interface that may be configured to control adjustment of a stereoscopic image.

Detailed Description

In addition to functions for detecting and ranging objects in an environment, other considerations for autonomous and semi-autonomous operation of the vehicle may also include safety, such as functions that remain on the driving trajectory and avoid collisions with objects. Accordingly, systems have been developed for detection, ranging, and/or security purposes.

For example, RADAR uses radio signals to detect and range objects, while L IDAR uses laser signals to detect and range objects

Approximately 8 cameras used by autonomous/semi-autonomous vehicles) in addition to cost, size, and/or ease of implementation, technical limitations may also be a factor L IDARs may have limited use at night, in cloudy weather, or at high altitudes (e.g., above 2000 meters).

In addition, humans have binocular vision systems using two eyes spaced about 2.5 inches apart (about 6.5 centimeters). Each eye sees the world from a slightly different perspective. The brain uses the difference in these perspectives to calculate or estimate distance. Such binocular vision systems are part of the ability to determine the distance of an object with relatively good accuracy. The relative distances of the plurality of objects in the field of view may also be determined by means of binocular vision.

Three-dimensional (stereoscopic) imaging exploits the depth perceived by binocular vision by presenting two images to a viewer, where one image is presented to one eye (e.g., the left eye) and the other image is presented to the other eye (e.g., the right eye). The images presented to both eyes may include substantially identical elements, but the elements in the two images may be offset from each other to simulate an offset viewing angle that may be perceived by the eyes of a viewer in everyday life. Thus, the viewer may perceive the depth of the element depicted by the image.

According to one or more embodiments of the invention, one or more stereoscopic images may be generated based on a single monoscopic image obtained from a camera sensor. The stereoscopic images may each include a first digital image and a second digital image, where the stereoscopic images, when viewed using any suitable stereoscopic viewing technique, may cause a user or software program to receive a three-dimensional effect for an element included in the stereoscopic images. The monoscopic images may depict a geographic scene at a particular geographic location, and the resulting stereoscopic images may provide a three-dimensional (3D) rendering of the geographic scene. The use of stereo images may help the system achieve more accurate detection and ranging capabilities. Reference to a "stereoscopic image" in the present disclosure may refer to any configuration of a first digital image (monoscopic) and a second digital image (monoscopic) that may together produce a 3D effect as perceived by a viewer or software program.

Fig. 1A illustrates an example system 100 configured to generate stereoscopic (3D) images according to some embodiments of the invention. The system 100 may include a stereo image generation module 104 (hereinafter referred to simply as "stereo image module 104") configured to generate one or more stereo images 108. The stereoscopic image module 104 may include any suitable system, apparatus, or device configured to receive the monoscopic images 102 and generate each of the stereoscopic images 108 based on two or more of the monoscopic images 102. For example, in some embodiments, the stereoscopic image module 104 may include software containing computer-executable instructions configured to cause a processor to perform operations for generating the stereoscopic image 108 based on the monoscopic image 102.

In some embodiments, monoscopic image 102 may include a digital image obtained by a camera sensor to depict a scene. For example, monoscopic image 102 may include a digital image depicting objects in a scene. In some embodiments, the object may be any visually detectable element, such as a tree, a pedestrian, a bird, an airplane, an airborne missile, a ship, a buoy, a river or ocean, a curb, a traffic sign, a line of traffic (e.g., a double line indicating "no-way zone"), a mountain, a wall, a house, a fire hydrant, a dog, or any other suitable object that is visually detectable by a camera sensor. Alternatively or additionally, in some embodiments, the monoscopic image 102 may include a digital image depicting a bird's eye view of the geographic scene. For example, the monoscopic image 102 may include a digital image depicting a bird's eye view of a geographic scene captured by an aircraft, satellite, telescope, or the like. In some cases, one or more of the monoscopic images 102 may depict a bird's eye view from a top-to-bottom vertical perspective, which may be looking vertically down or substantially vertically down at the geographic scene. In these or other cases, one or more of the single-view images 102 may or may not depict a bird's eye view at an oblique perspective other than looking vertically down at the geographic scene. In some embodiments, the stereoscopic image module 104 may be configured to acquire the monoscopic image 102 via a detection application communicatively coupled to a camera sensor. As described herein, a "detection application" is short for a "detection and ranging application".

In some embodiments, the stereoscopic image module 104 may be configured to access a detection application (such as the detection application 124 of fig. 1B) via any suitable network (such as the network 128 of fig. 1B) to request the monoscopic image 102 from the detection application. In these or other embodiments, the detection application and associated monoscopic images 102 may be stored on the same device, which may include the stereoscopic image module 104. In these or other embodiments, the stereoscopic image module 104 may be configured to access a detection application stored on a device that may store monoscopic images 102 to request monoscopic images 102 from a storage area of the device.

Alternatively or additionally, the stereoscopic image module 104 may be included with the detection application, wherein the stereoscopic image module 104 may obtain the monoscopic image 102 via the detection application by accessing a portion of the monoscopic image 102 controlled by the detection application. In other embodiments, the stereoscopic image module 104 may be separate from the detection application (e.g., as shown in fig. 1B), but may be configured to interface with the detection application to obtain the monoscopic image 102.

The stereo image module 104 may be configured to generate a stereo image 108 as shown below. To help explain these concepts, a description is given (shown in FIG. 1B and described below) with respect to generating an example stereo image 120, which example stereo image 120 is an example of one of the stereo images 108 of FIG. 1A. Further, a description is given regarding generating the stereoscopic image 120 based on the example first digital image 110 and the example second digital image 112 shown in FIG. 1B. First digital image 110 and second digital image 112 are examples of monoscopic images that may be included in monoscopic image 102 of FIG. 1A.

FIG. 1B illustrates an example environment 105 in which generating stereoscopic images based on one or more monoscopic frames occurs. The elements of FIG. 1B may be arranged in accordance with one or more embodiments of the invention. As shown, fig. 1B includes: a machine 122 having a detection application 124 and a computing system 126, a network 128, and a stereo image module 130 having a graphics-based model 132 and a computing system 134. Further shown is a scene 109, a first digital image 110, a second digital image 112, a focal point 113, a camera 114, a focal length 115a/115b, a virtual camera 116 and a displacement factor 118. In some embodiments, the stereoscopic image module 130 may be the same as or similar to the stereoscopic image module 104 described above in connection with fig. 1A. Alternatively or additionally, computing system 126 and computing system 134 may be the same as or similar to system 300 described below in connection with fig. 3.

In some embodiments, scene 109 may include any geographic scene in which camera 114 may capture images. For example, scene 109 may include a garage, a driveway, a street, a sidewalk, a sea, a river, the sky, a forest, a city, a village, a landing/launch area (such as airport runways and flight decks), a warehouse, a store, an inventory passage, and any other suitable environment in which machine 122 may detect and range objects. Thus, when camera 114 captures first digital image 110, first digital image 110 may include any aspect and/or portion of scene 109. Alternatively or additionally, the first digital image 110 may include a focal point 113 based on a focal length 115a of the camera 114. In these or other embodiments, the focal length 115a to the focal point 113 may be a known constant based on the specifications of the camera 114.

In some embodiments, camera 114 may be attached to machine 122. In the present disclosure, reference to a "machine" may refer to any device configured to store and/or execute computer code (e.g., executable instructions of a software application). In some embodiments, the machine may move from a first geographic location (e.g., "point a") to a second geographic location (e.g., "point B"). In these or other embodiments, machines 122 may be autonomous or semi-autonomous with respect to movement between geographic locations. Alternatively, machines 122 may be moved between geographic locations by manual operation. Examples of machine 122 may include a robot, drone, rocket, space station, autonomous car/truck, manually operated car/truck, equipment (e.g., construction/maintenance equipment such as backhoes, street sweepers, steam rollers, etc.), storage pods (e.g., mobile storage units, etc.), or any other suitable equipment configured to move between geographic locations.

Alternatively or additionally, the machine may include a fixed device, which in some embodiments is fixed in position. For example, the machine may comprise anti-missile equipment located at a military base, security equipment secured around a prison, a hovering helicopter, or any other suitable machine, whether temporarily fixed or permanently fixed. Alternatively or additionally, the machine may comprise a client device. Some examples of client devices may include mobile phones, smart phones, tablet computers, notebook computers, desktop computers, set-top boxes, virtual reality devices, wearable devices, connected devices, any mobile device with an operating system, and satellites, among others.

In these or other embodiments, the detection and ranging functions of machine 122 enabled by the present application may be advantageous in any field or industry, including, for example, commercial/industrial use, manufacturing use, military use (e.g., army, navy, national police, navy army, air force, and space force), government agency use (e.g., the federal survey bureau, the central intelligence agency, and the national transportation safety committee), and the like.

Alternatively or additionally, machine 122 may detect and/or range along a trajectory. The trajectory may include any travel path and/or surrounding area of machine 122, whether airborne, land, space, or on water. In these or other embodiments, camera 114 may be configured to capture a portion of the trajectory of machine 122 in first digital image 110, e.g., a portion of the trajectory closest to machine 122, another portion of the trajectory furthest from machine 122, or another portion of the trajectory portion that is not necessarily machine 122. By way of example, camera 114 may capture a portion of a trajectory of machine 122 up to about 2 meters from machine 122, up to about 5 meters from machine 122, up to about 20 meters from machine 122, up to about 50 meters from machine 122, up to 100 meters from machine 122, up to about 200 meters from machine 122, up to about 500 meters from machine 122, up to about 1000 meters from machine 122, up to about 5000 meters from machine 122, and so forth. Advances in camera technology, including camera lens technology, may continue to facilitate advantages in imaging speed, resolution, measurement accuracy, and focal length.

In some embodiments, the first digital image 110 captured by the camera 114 may be obtained by the detection application 124. For example, detection application 124 may request first digital image 110 from camera 114. Additionally or alternatively, the detection application 124 may receive the first digital image 110 sent from the camera 114.

In these or other embodiments, the stereoscopic image module 130 may obtain the first digital image 110 from the detection application 124. For example, the stereo image module 130 may request the first digital image 110 from the detection application 124. Additionally or alternatively, the stereo image module 130 may receive the first digital image 110 sent from the detection application 124. In these or other embodiments, the first digital image 110 may be obtained by the stereo image module 130 via the network 128, for example, as shown in fig. 1B, the stereo image module 130 is located remotely from the machine 122 (such as a remote server). The remote server may be the same as or similar to computing system 134. Additionally or alternatively, the remote server may include one or more computing devices (such as a rack-mounted server, router computer, server computer, personal computer, mainframe computer, notebook computer, tablet computer, desktop computer, smart phone, automobile, drone, robot, any mobile device with an operating system, etc.), data storage (e.g., hard disk, memory, database), networks, software components, and/or hardware components. In other embodiments, the stereoscopic image module 130 may obtain the first digital image 110 without the network 128, for example, the stereoscopic image module 130 is integrated with the machine 122 (e.g., not located at a remote server).

Additionally or alternatively, network 128 may include any suitable topology, configuration, or configuration including a star configuration, a token ring configuration, or other configuration network 128 may include a local area network (L AN), a Wide Area Network (WAN) (such as the Internet), DECTU L E, and/or other interconnected data paths through which multiple devices may communicate

Communication networks (e.g., MESH Bluetooth) and/or cellular communication networks, the data including data obtained via Short Message Service (SMS), Multimedia Message Service (MMS), Hypertext transfer protocol (HTTP), direct data connection, Wireless Application Protocol (WAP), email, etc. additionally, network 128 may include WiFi, NFC, L TE, L TE-advanced, 1G, 2G, 3G, 4G, 5G, etc,

(a wireless technology, intended to pass throughSensors and actuators enable low data rate communication over long distances for machine-to-machine communication as well as internet of things (IoT) applications), wireless USB, or any other such wireless technology.

In some embodiments, after the first digital image 110 is obtained by the stereo image module 130, the stereo image module 130 may input the first digital image 110 into the graphics-based model 132. As described herein, the term "graph-based model" may include a deep neural network, a deep belief network, a temporal recurrent neural network, or some other graphical model (such as a genetic programming model, a tree-based or forest-based machine learning model). Thus, the graph-based model 132 may include any artificial intelligence system or learning-based mechanism, examples of which may include the graph-based model 132: perceptron, multi-tier perceptron, feedforward, radial basis network, deep feedforward, time-recursive neural network, long/short term memory, gated recursive unit, autoencoder, variable autoencoder, denoise autoencoder, sparse autoencoder, any sequence-to-sequence model, shallow neural network, markov chain, hopplet network, boltzmann machine, constrained boltzmann machine, deep belief network, deep convolutional network, convolutional neural network (e.g., VGG-16), deconvolution network, deep convolutional inverse graph network, modular neural network, generative confrontation network, liquid machine, extreme learning machine, echo state network, structural recursive neural network, deep residual network, kojon network, support vector machine, neural machine, and the like.

In some embodiments, the graphics-based model 132 may be trained to generate (e.g., with the aid of the system 134) the second digital image 112 based on input in the form of the first digital image 110. The training of the graph-based model 132 is described later in this disclosure. In these or other embodiments, the second digital image 112 may be configured as an image of the same or similar area of the scene 109. Thus, in some embodiments, the first digital image 110 and the second digital image 112 may substantially overlap. In these or other embodiments, data corresponding to portions of the first digital image 110 that do not overlap with the second digital image 112 may be discarded. Additionally or alternatively, the second digital image 112 may be generated as a monoscopic image that visually simulates what the virtual camera 116 would like to image when the virtual camera 116 is supposed to be an actual camera like the camera 114. In these or other embodiments, the virtual camera 116 is actually positioned at a different location than the actual location of the camera 114. Thus, in some embodiments, the imaged object in the first digital image 110 may be imaged from a first position and/or at a first angle. Alternatively or additionally, the object may be imaged in the second digital image 112 from a second position and/or at a second angle, such that the second position and/or the second angle are different from the first position and the first angle, respectively. In this manner, the first digital image 110 captured by the camera 114 and the second digital image 112 generated by the stereoscopic image module 130 may be employed to generate a stereoscopic image 120 having a perceivable depth.

In these or other embodiments, the positional relationship of the camera 114 relative to the virtual camera 116 may include a displacement factor 118. As described herein, the displacement factor 118 may include: angles or directions for one or more axes (e.g., roll, pitch, and yaw), offset lateral distances, or offset longitudinal heights, etc. In some embodiments, the displacement factor 118 may be a known constant. Alternatively or additionally, the displacement factor 118 may be set to a value such that the stereoscopic image 120 produced from the second digital image 112 is of sufficiently good quality and accuracy. For example, the displacement factor 118 may be set to a value such that the distance measurement based on the stereo image 120 is sufficiently accurate and/or suitable for a particular model.

In other embodiments, rather than generating the stereoscopic image 120 from the second digital image 112 generated as described above (e.g., from the graphics-based model 132), the second digital image 112 may be captured to generate the stereoscopic image 120 in addition to the first digital image 110. For example, the first digital image 110 and the second digital image 112 may be captured by the camera 114 at different positions or at different angles of rotation, which may also affect the size, shape, etc. of the overlap region between the first digital image 110 and the second digital image 112. For example, FIG. 1C illustrates an embodiment in which camera 114 may capture first digital image 110 at first location 136a and camera 114 may capture second digital image 112 at second location 136 b. The solid triangle of FIG. 1C may correspond to the field of view of camera 114 at first position 136a, and the dashed triangle of FIG. 1C corresponds to the field of view of camera 114 moving from first position 136a to second position 136 b. As shown, fig. 1C depicts a side view of camera 114 and the field of view of camera 114 at first and

second positions

136a and 136 b. In some embodiments, the distance between the first location 136a and the second location 136b may be related to the lateral offset between the first region and the second region in some embodiments.

Alternatively or additionally, the distance between first position 136a and second position 136b may be determined according to a target offset, which may be based on a target degree of 3D effect. For example, in some embodiments, the second digital image 112 may be requested based on coordinates that may be associated with the first region such that one or more of the coordinates may also be included in the second region, but offset in the second digital image 112 by the target offset as compared to the coordinate location in the first digital image 110.

As another example, FIG. 1D illustrates an example where the camera 114 may capture the first digital image 110 at a first rotational position 138a and may capture the second digital image 112 at a second rotational position 138 b. The solid triangle of FIG. 1D may correspond to the field of view of camera 114 at first rotational position 138a, and the dashed triangle of FIG. 1D may correspond to the field of view of camera 114 at second rotational position 138 b. The amount of rotation between the first rotational position 138a and the second rotational position 138b also affects the lateral offset between the first region and the second region. Alternatively or additionally, in some embodiments, the amount of rotation of the first and second directions toward each other may be based on a target angle of rotation. In some embodiments, the target rotation angle may be based on the target 3D effect. Alternatively or additionally, the target rotation angle may be based on a target focus (e.g., focus 113) for achieving the target 3D effect.

Capturing the first digital image 110 and the second digital image 112 according to fig. 1C-1D would not only cause the first region and the second region to comprise different sized portions of the scene 109, but the perspective of the scene 109 would also be different for different camera angles. Different viewing angles may also affect the shape and size of the overlap region.

Any suitable technique may be used to determine the overlap region between the first digital image 110 and the second digital image 112. For example, in some embodiments, the overlap region may be determined based on a comparison of image data included in pixels of the first digital image 110 and the second digital image 112 to determine which elements of the scene 109 may be depicted in both the first digital image 110 and the second digital image 112. Alternatively or additionally, the overlap region may be determined using and based on geometric principles, which may be associated with: camera position during capture of first digital image 110 and second digital image 112, camera rotation during capture of first digital image 110 and second digital image 112, relative relationship of first orientation and second orientation, amount of offset between first region and second region and amount of tilt in bird's eye view of first digital image 110 and second digital image 112, scaling factor of first digital image 110, scaling factor of second digital image 112, size of first digital image 110, size of second digital image 112, size of first region, size of second region, and the like.

In some embodiments, the third digital image may be obtained based on the overlap region between the first digital image 110 and the second digital image 112. For example, in some embodiments, the third digital image may be obtained based on the second sub-region corresponding to the overlap region and depicted in the second digital image. Further, the third digital image may be obtained based on the size (e.g., resolution), aspect ratio, and dimensions (e.g., number of horizontal and vertical pixels) of the first digital image 110, such that the third digital image may have substantially the same size, aspect ratio, and dimensions.

In some embodiments, the stereo image 120 may be used to generate a depth map. For example, the detection application 124 and/or the stereo image module 130 may generate a depth map. An example of a depth map is shown in fig. 4. The depth map may include corresponding pixels for each pixel in the stereoscopic image 120. Each corresponding pixel in the depth map may represent relative distance data from camera 114 for each respective pixel in stereoscopic image 120. For example, a pixel in the depth map having a particular shade of purple or gray may correspond to a particular relative distance that is not an actual distance value. Thus, in some embodiments, the pixels in the first depth map and the pixels in the second depth map may comprise the same color or shade of gray, but have different actual distance values (e.g., even orders of magnitude different actual distance values). In this way, the color or grayscale in the generated depth map does not represent the actual distance value of the pixel; conversely, the color or grayscale of a pixel in the generated depth map may represent a distance value relative to a neighboring pixel.

In some embodiments, a subset of pixels of the total amount of pixels in the depth map may be associated with the object. For example, the detection application 124 and/or the stereo image module 130 may determine that a subset of pixels in the depth map is indicative of an object. In this way, the presence of an object may be initially identified or detected, although it is not necessary to range the object. To range a detected object, a portion of a subset of pixels associated with the object may be analyzed. In some embodiments, portions of the subset of pixels associated with the object may be analyzed instead of the entirety of the subset of pixels associated with the object to reduce computational overhead, increase ranging speed, and the like. For example, each pixel associated with a pedestrian (e.g., foot, leg, torso, neck, and head) need not be all range-measured. Conversely, one or more portions of pixels associated with a pedestrian may be considered to represent the pedestrian's position relative to the camera 114 for ranging purposes. In these or other embodiments, the subset of pixels associated with the object may be averaged, segmented, or otherwise reduced to be a portion of the subset of pixels. Alternatively or additionally, the resolution of one or both of the stereoscopic image 120 and the depth map may be temporarily reduced (and then restored to the original resolution). In this manner, the portion of the subset of pixels may include relative distance data that is substantially representative of the object.

In some embodiments, the relative distance data of the object may be converted to actual distance values (e.g., in inches, feet, meters, kilometers, etc.). To convert the depth map based relative distance data into actual distance values to objects, a predetermined relationship between the relative distance data, the focal points 113 of the first and second

digital images

110, 112, the displacement factor 118 between the camera 114 and the virtual camera 116, and/or a correction curve that compensates for the shift in distance measurements based on perceived depth in the stereo image may be used. In these or other embodiments, the accuracy of the relative distance data in the depth map decreases as the distance from the camera 114 increases. Thus, once the actual distance data is converted from the relative distance data, the offset from the actual distance data may be plotted or fitted to a curve as a function of the actual distance. Thus, in some embodiments, a curve of correction values may be implemented to correct for deviations from actual distance data.

In some embodiments, the graphics-based model 132 may be trained to generate the second digital image 112 based on a single monoscopic image, such as the first digital image 110, for subsequent generation of the stereoscopic image 120. To train the graphics-based model 132, stereo-pair images may be provided to the graphics-based model 132. The stereo pair image may include a first monoscopic image and a second monoscopic image. Fig. 5 shows an example of a stereo pair provided to a graph-based model 132 for training purposes. In these or other embodiments, the first monoscopic image and the second monoscopic image may comprise images taken from any same or similar scene, but from different positions and/or angles. In this way, the first monoscopic image and the second monoscopic image taken together may form a stereoscopic pair having an appreciable depth. Alternatively or additionally, the first monoscopic image and the second monoscopic image may include a scene 109 of any type, nature, location, or subject matter. Some stereo-pair images may be related by type, nature, location, or subject; however, in addition to increasing the number, the diversity between stereo pair images may also help to improve the training quality or performance of the graphics-based model 132 to generate the second digital image 112 and the stereo image 120 with sufficiently good quality and accuracy.

In some embodiments, the training of the graphics-based model 132 may occur on the server side, e.g., at the stereo image module 130 when located remotely from the machine 122. Alternatively or additionally, the training of the graphics-based model 132 may be a one-time process, and the generation of the second digital image 112 and the stereoscopic image 120 may be subsequently accomplished. In other embodiments, the training of the graph-based model 132 may occur on a rolling basis (e.g., continuously) or on an interval basis (e.g., a predetermined schedule), as desired. As an example, where desired, instances of imprecision or security threats may arise, for example, in the event of a security violation or accident. In this case, the graphics-based model 132 may be provided with additional training focused on inaccuracies or security threats. Alternatively or additionally, one or more aspects of the training of the graph-based model 132 may occur at the machine 122 (e.g., via the detection application 124). As an example, feedback may be received at the graph-based model 132 from: from detection application 124 via machine 122, from a user of machine 122 via machine 122, from a third party such as a law enforcement officer, and so forth.

Modifications, additions, or omissions may be made to environment 105 without departing from the scope of the disclosure. For example, environment 105 may include other elements in addition to those specifically listed. Additionally, environment 105 may be included in any number of different systems or devices.

Fig. 2 illustrates an example flow diagram of a method 200 for mapping and/or detecting and ranging objects based on one or more single-field-of-view frames. The method 200 may be arranged in accordance with at least one embodiment described in this disclosure. In some embodiments, the method 200 may be performed in whole or in part by a software system and/or a processing system (e.g., the system 300 described below in connection with fig. 3). In these and other embodiments, some or all of the steps of the method 200 may be performed based on execution of instructions stored on one or more non-transitory computer-readable media. Although shown as discrete blocks, the various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.

The method 200 may begin at block 205, where a first digital image is obtained via one or both of a detection application and a camera sensor at block 205. The first digital image may be a monoscopic image of a scene depicted from a first location communicatively coupled to a camera sensor of the detection application. In some embodiments, the first digital image may include a trajectory of the machine.

At block 210, a second digital image may be generated based on the first digital image. The second digital image may be a monoscopic image of a scene depicted from a second location different from the first location. In these or other embodiments, the second digital image is not an image captured by a camera (e.g., the camera that captured the first digital image at block 205).

At block 215, a stereoscopic image of the scene may be generated. The stereoscopic image may include a first digital image and a second digital image. In these or other embodiments, the stereo image may be an image on which detection and range determination operations may be based.

Those skilled in the art will appreciate that for this and other methods disclosed in this disclosure, the blocks of the method may be implemented in a different order. Further, these blocks are provided as examples only, and some blocks may be optional, combined into fewer blocks, or expanded into additional blocks.

For example, in some embodiments, one or more additional blocks may be included in the method 200, the one or more additional blocks including obtaining a plurality of stereo-pair images, each stereo-pair image including a first monoscopic image and a second monoscopic image; a plurality of stereo-pair images is sent as input into the graphics-based model. In this manner, the graphics-based model may be trained to know how to generate the second digital image of block 210 based on the first digital image in order to subsequently generate the stereoscopic image of block 215.

Alternatively or additionally, one or more additional blocks may be included in the method 200 that include sending the first digital image as input into the graphics-based model, wherein the second digital image is output from the graphics-based model based on the first digital image input into the graphics-based model and one or both of the plurality of stereo-pair images.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks including generating a depth map including a corresponding pixel for each pixel in the stereoscopic image, each corresponding pixel in the depth map representing relative distance data from the camera sensor for each respective pixel in the stereoscopic image.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising a metaphor for associating a subset of pixels of a total amount of pixels in the depth map as the object, the actual distance from the camera sensor to the object in the stereoscopic image being obtained based on a portion of the subset of pixels in the depth map associated with the object using: relative distance data of a portion associated with the object; a focus of the first digital image and the second digital image; and a displacement factor between the first digital image and the second digital image. In some embodiments, the method of acquiring the actual distance to the object may include determining a correction value that compensates for the offset in the distance measurement based on the perceived depth in the stereoscopic image.

Alternatively or additionally, one or more additional blocks may be included in method 200, including sending an alert for presentation via the detection application when the actual distance to the object satisfies the first threshold distance; and/or cause, via the detection application, a machine communicatively coupled to the detection application to perform the corrective action when the actual distance to the object satisfies the second threshold distance. In some embodiments, the first and second threshold distances may be the same, while in other embodiments, may be different distances to the detected object. Alternatively or additionally, the first threshold distance and/or the second threshold distance may vary depending on any of a myriad of factors. For example, the causes that affect the first and second threshold differences may include: a speed of the machine and/or object, a trajectory of the machine and/or object, a calibration rule or law, a cost/benefit analysis, a risk prediction analysis, or any other suitable type of factor that makes reasonable a threshold distance between the machine and the detected object.

In some embodiments, the alert for presentation (e.g., on a display) via the detection application may include a visual alert signal and/or an audible alert signal. Alternatively or additionally, the detection application may cause the machine to perform corrective actions including stopping the machine, decelerating the machine, turning the machine, lowering/raising the height of the machine, avoiding maneuvers, or any other suitable type of corrective action for mitigating damage to and/or preventing contact between the machine and the object.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising determining a presence of an object within the stereoscopic image; the object is classified based on an image recognition process of the object via a graph-based model. In some embodiments, determining the presence of an object within a stereoscopic image may include analysis of pixels within the stereoscopic image and/or within a depth map. For example, if a group of pixels forms an example shape or includes a particular color or grayscale, the presence of an object may be inferred. In these or other embodiments, the identification of the object may be a separate step.

In some embodiments, the image recognition may include image recognition training of a graph-based model. For example, the graph-based model may be fed with input data (e.g., an image of an object), and the output (e.g., guess) of the graph-based model may be compared to an expected result, such as a predetermined or artificially specified label. Using additional cycles through the input data, weights, biases, and other parameters in the graph-based model can be modified to reduce the error rate of guessing. For example, weights in the graph-based model may be adjusted so that the guess better matches a predetermined or artificially specified label of the image of the object.

In these or other embodiments, the input data fed to the graph-based model for training purposes may include images of a large number of different objects. Hundreds, thousands, or millions of object images may be provided to the graphics-based model. Alternatively or additionally, the image of the object provided to the graphics-based model may include labels corresponding to one or more features, pixels, boundaries, or any other detectable aspect of the object.

In these or other embodiments, additionally or alternatively, image recognition techniques may be used with the graph-based model to classify objects. For example, examples of image recognition techniques may include using: gray scale; RGB (red, green and blue) values, for example, ranging from 0 to 255; pre-processing techniques (e.g., image cropping/flipping/angle manipulation, adjusting image hue, contrast, saturation, etc.); a subset of test data or a small batch of test data rather than an entire data set; and minimizing the size of the image by obtaining a maximum pixel value of the grid.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving, at a first interface element of a user interface, a first user input regarding a degree of rendered stereoscopic depth in a stereoscopic image; adjusting the stereoscopic depth based on a first user input; receiving a second user input regarding an adjustment of a z-plane position of a stereoscopic image at a second interface element of the user interface; adjusting the z-plane position based on the second user input; and generating (e.g., regenerating) a stereoscopic image based on the adjustment of the stereoscopic depth and the adjustment of the z-plane position. In some embodiments, the scene may comprise a geographic scene, and/or the stereoscopic image may comprise one of a plurality of stereoscopic images of a geographic scene used in a mapping application and/or a detection and ranging application.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving, at a third interface element of the user interface, a third user input regarding a field of view size of a scene depicted in the stereoscopic image; adjusting the field of view size based on a third user input; and generating (e.g., regenerating) a stereoscopic image based on the adjustment of the field of view size.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving, at a third interface element of the user interface, a third user input regarding a pitch angle depicting a scene depicted in the stereoscopic image; adjusting the pitch angle based on a third user input; and generating (e.g., regenerating) a stereoscopic image based on the adjustment of the pitch angle.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving a third user input regarding a distance to a viewpoint from which a scene depicted in a stereoscopic image is depicted at a third interface element of the user interface; adjusting the distance based on a third user input; and generating (e.g., regenerating) a stereoscopic image based on the adjustment of the distance.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving, at a third interface element of the user interface, a third user input regarding a scaling of an object of a scene depicted in the stereoscopic image; adjusting the zoom scale based on a third user input; and generating (e.g., regenerating) a stereoscopic image based on the scaling adjustment.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving, at a third interface element of the user interface, a third user input regarding a speed that simulates following a navigation route within the geographic scene as presented in the plurality of stereoscopic images; adjusting the speed based on a third user input; and simulating a process of following the navigation route within the geographic scene based on the adjusted speed.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving, at a third interface element of the user interface, a third user input regarding a flight mode of the mapping application, wherein the flight mode simulates flying above the ground of a geographic scene presented in a plurality of stereoscopic images; and enabling the flight mode based on a third user input.

Alternatively or additionally, one or more further blocks may be included in the method 200, the one or more further blocks comprising one or more of the following steps: receiving, at a third interface element of the user interface, a third user input regarding an aerial image mode of the mapping application, wherein the aerial image mode depicts a stereoscopic view of the geographic scene based on one or more images of the geographic scene captured by the camera; and enabling the aerial image mode based on a third user input.

Fig. 3 illustrates an example system 300 that can be used for mapping and/or detecting and ranging objects based on one or more single-field-of-view frames. The system 300 may be arranged in accordance with at least one embodiment described in this disclosure. Alternatively or additionally, system 300 may be configured to perform one or more aspects of method 200 described above in connection with fig. 2. System 300 may include a processor 312, a memory 314, a communication unit 316, a display 318, peripheral devices 322, and a user interface unit 320, all communicatively coupled. In some embodiments, system 300 may be part of any system or device described in the present disclosure.

In general, processor 312 may comprise any suitable special purpose or general-purpose computer, computing entity, or processing device comprising various computer hardware or software modules, and may be configured to execute instructions stored on any suitable computer-readable storage medium. For example, processor 312 may include a microprocessor, microcontroller, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or any other digital or analog circuitry configured to interpret and/or execute program instructions and/or process data.

Although shown in fig. 3 as a single processor, it is to be understood that processor 312 may include any number of processors distributed across any number of networks or physical locations configured to perform, individually or collectively, any number of the operations described in this disclosure. In some embodiments, processor 312 may interpret and/or execute program instructions and/or process data stored in memory 314. In some embodiments, processor 312 may execute program instructions stored in memory 314.

For example, in some embodiments, processor 312 may execute program instructions stored in memory 314 for single monoscopic frame based correlation detection and ranging. In these and other embodiments, instructions may be used to perform one or more of the operations or functions described in this disclosure.

Memory 314 may include a computer-readable storage medium or one or more computer-readable storage media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable storage media can be any available media that can be accessed by a general purpose or special purpose computer, such as the processor 312. By way of example, and not limitation, such computer-readable storage media can comprise non-transitory computer-readable storage media including Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), compact disc read only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices (e.g., solid state memory devices), or any other storage medium that can be used to carry or store particular program code in the form of computer-executable instructions or data structures and that can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable storage media. The computer-executable instructions may include, for example, instructions and data configured to cause the processor 312 to perform a particular operation or set of operations as described in this disclosure. In these and other embodiments, the term "non-transitory" as explained In this disclosure should be interpreted to exclude only those transitory media found In the federal law institute decision "In re Nuijten,500f.3d 1346(fed. cir.2007)" not to fall within the scope of patentable subject matter. Combinations of the above should also be included within the scope of computer-readable media.

The communication unit 316 may include any component, device, system, or combination thereof configured to send or receive information over a network. In some embodiments, the communication unit 316 may communicate with other devices in other locations, other devices in the same location, or even other components in the same system. For example, communication unit 316 may include a modem, a network card (wireless or wired), an infrared communication device, a wireless communication device (e.g., an antenna), and/or a chipset (e.g., a bluetooth device, 802.6 device (e.g., a Metropolitan Area Network (MAN)), a Wi-Fi device, a WiMax device, a cellular communication device, etc.), among others. The communication unit 316 may allow data to be exchanged with a network and/or any other device or system described in this disclosure.

For example, the display 318 may be configured to be topological, indicate abrupt changes in topology, indicate warning notifications, display validation performance improvement values, display weights, deviations, etc., and other data indicated by the processor 312.

Peripheral devices 322 may include one or more devices. For example, the peripheral devices may include sensors, microphones, and/or speakers, among other peripheral devices.

User interface unit 320 may include any device that allows a user to interface with system 300. For example, user interface element 320 may include a mouse, track pad, keyboard, buttons and/or touch screen, among other devices. The user interface unit 320 may receive input from a user and provide the input to the processor 312. In some embodiments, the user interface unit 320 and the display 318 may be combined.

Alternatively or additionally, in some embodiments, the user interface unit 320 of fig. 3 may include a first interface element configured to receive a first user input regarding a degree of stereoscopic depth rendered in the stereoscopic image 108 of fig. 1A. The first interface element is configured to direct adjustment of the stereoscopic depth based on the first user input such that the stereoscopic image 108 of fig. 1A has an adjusted stereoscopic depth based on the stereoscopic depth.

The adjustment of the stereoscopic depth may include adjusting a degree of stereoscopic effect provided by the stereoscopic image 108 of fig. 1A. For example, fig. 6A shows a stereoscopic depiction 650a of a scene 652, the stereoscopic depiction 650a corresponding to a first stereoscopic image and being displayed on a display screen 654. Additionally, fig. 6B shows a stereoscopic depiction 650B of the scene 652, the stereoscopic depiction 650B corresponding to the second stereoscopic image and being displayed on the display screen 654. As shown in a comparison between fig. 6A and 6B, the first stereoscopic image and the second stereoscopic image may have different degrees of stereoscopic depth, which may be according to different settings of the first interface element adjusted based on user input.

As another example, the user interface unit 320 of fig. 3 includes a second interface element configured to receive a second user input regarding an adjustment of a z-plane position of the stereoscopic image 108 of fig. 1A. The second interface element is configured to direct an adjustment of the z-plane position based on a second user input such that the z-plane of the stereoscopic image 108 of fig. 1A may have an adjusted position based on the z-plane.

The adjustment of the z-plane may include an adjustment of a position of a scene depicted by the stereoscopic image 108 of fig. 1A relative to a screen on which the stereoscopic image 108 is displayed, such that a relative position of the stereoscopic effect of the stereoscopic image 108 relative to the screen may be adjusted. For example, the adjustment of the z-plane may adjust how much of the stereoscopic effect is projected off-screen or behind the screen.

For example, fig. 6C shows a stereoscopic depiction 650C of a scene 652 corresponding to the third stereoscopic image and displayed on a display screen 654. As shown in fig. 6A, the first stereoscopic image has a first z-plane position in which most of the stereoscopic effect is within the screen 654. In contrast, as shown in fig. 6C, the third stereoscopic image has a third z-plane position in which most of the stereoscopic effect is outside of the display screen 654. The different z-plane positions may be different according to different settings of the second interface element that are adjusted based on user input.

As another example, the user interface unit 320 of fig. 3 may include a third interface element configured to receive a third user input regarding a field of view size of a scene depicted in the stereoscopic image 108 of fig. 1A. The third interface element is configured to direct an adjustment of a size of the field of view based on a third user input, such that the field of view of the scene depicted in the stereoscopic image 108 may be based on the adjustment of the field of view.

The adjustment of the field of view may include adjusting an amount of scene that may be depicted in the stereoscopic image 108 such that the amount of scene included in the stereoscopic image 108 may be adjusted. The adjustment may include adjusting a horizontal field of view, a vertical field of view, or a combination of horizontal and vertical fields of view.

For example, fig. 6D shows a stereoscopic depiction 650D of a scene 652 corresponding to the fourth stereoscopic image and displayed on a display screen 654. As shown by the comparison between fig. 6A and 6D, the first stereoscopic image has a wider field of view than the fourth stereoscopic image, depending on the different settings of the third interface element adjusted by the user input.

As another example, the user interface unit 320 of fig. 3 may include a fourth interface element configured to receive a fourth user input regarding a pitch angle depicting the scene depicted in the stereoscopic image 108 of fig. 1A. The fourth interface element is configured to direct an adjustment of the pitch angle based on a fourth user input, such that the pitch angle depicting the scene in the stereoscopic image 108 is based on the adjustment of the pitch angle.

The adjustment of the pitch angle may include adjusting a viewing angle from which a scene depicted in the stereoscopic image 108 may be observed. For example, the pitch angle may be adjusted to view the scene directly from above, at a 45 ° angle, a 0 ° angle, or any angle in between.

For example, fig. 6E shows a stereoscopic depiction 650E of a scene 652 corresponding to the fifth stereoscopic image and displayed on a display screen 654. As shown in fig. 6A, a scene 652 may be depicted in a first stereoscopic image based on a first perspective. Further, as shown in fig. 6E, a scene 652 may be depicted in the fifth stereoscopic image based on a fifth perspective different from the first perspective. The different perspectives may be based on different settings of the fourth interface element that are adjusted based on the user input.

As another example, the user interface unit 320 of fig. 3 may include a fifth interface element configured to receive a fifth user input of a distance with respect to a viewpoint from which a scene depicted in the stereoscopic image 108 of fig. 1A is depicted. The fifth interface element is configured to direct an adjustment of the distance based on a fifth user input, such that the distance of the viewpoint in the stereoscopic image 108 may be dependent on the adjustment of the distance.

The adjustment of the distance may include an adjustment of the distance between the viewpoint of the scene being viewed and the scene. For example, in some embodiments, the adjustment of the distance may include an adjustment of the altitude from which the geographic scene depicted by the stereoscopic image 108 may be viewed.

For example, fig. 6F shows a stereoscopic depiction 650F of a scene 652 corresponding to the sixth stereoscopic image and displayed on a display screen 654. As shown in fig. 6A, a scene 652 may be depicted in a first stereo image based on a viewpoint of a first air distance from the ground. Additionally, as shown in fig. 6F, a scene 652 may be depicted in the sixth stereo image based on the viewpoint of a second air distance from the ground, where the second air distance is higher than the first air distance. The different distances may be based on different settings of the fifth interface element that are adjusted according to user input.

As another example, the user interface unit 320 of fig. 3 may include a sixth interface element configured to receive a sixth user input regarding a scaling of an object that may be depicted in the stereoscopic image 108 of fig. 1A. The sixth interface element is configured to direct an adjustment of the scaling such that the scaling of the object depicted in the stereoscopic image 108 is based on the adjustment that may be based on the scaling.

The adjustment of the scaling may include adjusting a size of an object depicted in the stereoscopic image 108. For example, a scene may include a plurality of buildings of different heights. In some cases, one or more buildings may be too tall so that the height of all buildings depicting via the stereoscopic image 108 to be zoomed in the 3D rendering of the scene may obstruct the view of the scene. Adjusting the zoom scale may adjust the zoom scale of the higher objects such that the rendered height of the higher objects is reduced compared to other objects, thereby reducing the obstruction to viewing.

For example, fig. 6G shows a stereoscopic depiction 650G of a scene 652 corresponding to the seventh stereoscopic image and displayed on a display screen 654. As shown in fig. 6A, the scene 652 may include buildings 660 that are taller than other buildings. As shown in fig. 6G, the height of building 660 may be reduced. The different heights may be based on whether the zoom function is enabled or disabled based on user input.

As described above, in some embodiments, the user interface unit 320 of FIG. 3 may be included in any suitable system that may generate the stereoscopic images 108 of FIG. 1A. In these or other embodiments, the stereoscopic images 108 may depict a geographic scene, such as with respect to a mapping application and/or a detection and ranging application. Alternatively or additionally, the user interface unit 320 of fig. 3 may include one or more interface elements that may be used to depict a geographic scene.

For example, in some embodiments, a mapping application and/or a detection and ranging application may be configured to simulate traveling along a particular navigation route. In these or other embodiments, the user interface unit 320 of fig. 3 may include a seventh interface element configured to receive a seventh user input regarding simulating a speed of following the navigation route. The seventh interface element is configured to direct the adjustment of the speed such that the simulated travel speed as depicted in the stereoscopic image 108 of fig. 1A is based on the adjustment of the speed.

In these or other embodiments, the mapping application and/or the detection and ranging application (or any other suitable system that depicts a geographic scene) may be configured to simulate flying through a geographic scene. In these or other embodiments, the user interface unit 320 of fig. 3 may include an eighth interface element configured to receive an eighth user input regarding a flight mode. The eighth interface element may be configured to indicate enabling or disabling the flight mode based on an eighth user input.

In these or other embodiments, the mapping application and/or the detection and ranging application (or any other suitable system depicting a geographic scene) may be configured to generate the stereoscopic image 108 of fig. 1A based on one or more images of the geographic scene that may be captured by one or more cameras such that the mapping application and/or the detection and ranging application may depict a stereoscopic view of the geographic scene based on the images. Alternatively or additionally, the mapping application and/or the detection and ranging application may be configured to generate the stereoscopic image 108 based on the mapping such that the mapping application and/or the detection and ranging application may depict a rendered stereoscopic view of the geographic scene. In these or other embodiments, the user interface unit 320 of fig. 3 may include a ninth interface element configured to receive a ninth user input regarding an aerial image mode or a map mode. The ninth interface element may be configured to direct enabling or disabling of the aerial image mode or the map mode based on a ninth user input.

As described above, the user interface unit 320 of fig. 3 may thus be configured to control one or more parameters of the stereoscopic image 108 of fig. 1A based on user input, such that the stereoscopic image 108 may be custom generated.

Modifications, additions, or omissions may be made to system 300 without departing from the scope of the disclosure. For example, in some embodiments, system 300 may include any number of other components not explicitly shown or described. Further, depending on the particular implementation, system 300 may not include one or more of the illustrated and described components. In addition, as a further example, the number and type of interface elements included in the user interface unit 320 of fig. 3 may vary. In addition, although the terms "first," "second," "third," "fourth," etc. are used with respect to interface elements, user inputs, etc., these terms do not necessarily imply a particular order or number of elements, but are used merely to simplify description. For example, in some embodiments, the user interface unit 320 of fig. 3 may include the first interface element, the second interface element, and the fourth interface element without the third interface element. In addition, the effects shown and described with respect to fig. 6A-6G of the user interface element 320 of fig. 3 are not necessarily drawn to scale or actual, but are merely used to help improve understanding of the present disclosure. Furthermore, although one or more features of the user interface unit 320 of fig. 3 have been described with respect to mapping and/or detection and ranging applications, the user interface unit 320 is not limited to these applications.

Fig. 7 illustrates an example embodiment of a visual depiction of a screen 754 with respect to a user interface 722, wherein the user interface 722 may be configured to control adjustment of stereoscopic images. In some embodiments, user interface 722 may be an example of user interface element 320 of fig. 3. In this example, the user interface 722 may correspond to a mapping application and/or a detection and ranging application. However, one or more interface elements of user interface 722 may be used in implementations unrelated to mapping applications and/or detection and ranging applications.

In some embodiments, user interface 722 may include an "intraocular distance" (IOD) interface element, which may correspond to the first interface element described above with respect to fig. 6A and 6B. As such, the IOD interface element may be configured to receive a first user input regarding a degree of stereoscopic depth, and may be configured to direct adjustment of the stereoscopic depth based on the first user input.

In the example shown, the IOD interface elements may include IOD element 762a and IOD element 762 b. IOD element 762a may include a field in which a user may enter a number indicating an amount of stereo depth as a first user input via an input device. IOD element 762b may include a slider with a slider, where a user may move the slider as a first user input through an input device to adjust an amount of stereoscopic depth. In some embodiments, the IOD interface element may be configured such that movement of a slider of IOD element 762b may cause a value populated within a field of IOD element 762a to automatically change to correspond to the position of the slider. Alternatively or additionally, the IOD interface element may be configured such that a change in a value within a field of the IOD element 762a may cause a slider of the IOD element 762b to move to a position corresponding to the value within the field.

The embodiment of the IOD interface element is merely an example. For example, in some embodiments, the IOD interface elements may include only IOD element 762a or only IOD element 762 b. In these or other embodiments, the IOD interface element may comprise another type of interface element.

In some embodiments, user interface 722 may include a "screen Z" interface element, which may correspond to the second interface element described above with respect to fig. 6A and 6C. As such, the screen Z-interface element may be configured to receive a second user input regarding the Z-plane position, and may be configured to direct an adjustment of the Z-plane position based on the second user input.

In the example shown, the screen Z interface elements may include a screen Z element 764a and a screen Z element 764 b. The screen Z element 764a may include a field where the user may enter a number indicating the position of the Z-plane as a second user input via the input device. The screen Z element 764b may include a slider with a slider, where the user may move the slider as a second user input via the input device to adjust the position of the Z-plane. In some embodiments, the screen Z interface element may be configured such that movement of the slider of the screen Z element 764b may cause the value populated in the field of the screen Z element 764a to automatically change to correspond with the position of the slider. Alternatively or additionally, the screen Z interface element may be configured such that a change in a value within a field of the screen Z element 764a may cause the slider of the screen Z element 764b to move to a position corresponding to the value within the field.

The illustrated embodiment of the screen Z interface element is merely an example. For example, in some embodiments, the screen Z interface element may include only screen Z element 764a or only screen Z element 764 b. In these or other embodiments, the screen Z interface element may comprise another type of interface element.

In some embodiments, the user interface 722 may include a "field of view" (FOV) interface element, which may correspond to the third interface element described above with respect to fig. 6A and 6D. As such, the FOV interface element may be configured to receive a third user input regarding the size of the field of view, and may be configured to direct the adjustment of the field of view based on the third user input.

In the example shown, the FOV interface element may include FOV element 760a and FOV element 760 b. The FOV element 760a may include a field where the user may enter a number indicating the size of the field of view as a third user input via the input device. The FOV element 760b may include a slider with a slider, where the user may move the slider as a third user input via the input device to resize the field of view. In some embodiments, the FOV interface element may be configured such that movement of the slider of FOV element 760b may cause the value populated within the field of FOV element 760a to automatically change to correspond to the position of the slider. Alternatively or additionally, the FOV interface element may be configured such that a change in a value within a field of FOV element 760a may cause the slider of FOV element 760b to move to a position corresponding to the value within the field.

The illustrated embodiment of the FOV interface element is merely an example. For example, in some embodiments, the FOV interface element may include only FOV element 760a or only FOV element 760 b. In these or other embodiments, the FOV interface element may comprise another type of interface element.

In some embodiments, the user interface 722 may include a "pitch angle" interface element, which may correspond to the fourth interface element described above with respect to fig. 6A and 6E. As such, the pitch interface element may be configured to receive a fourth user input regarding the viewing angle, and may be configured to direct the adjustment of the viewing angle based on the fourth user input.

In the example shown, the pitch interface elements may include a pitch element 766a and a pitch element 766 b. The pitch angle element 766a may include a field where a user may enter a number indicating a viewing angle as a fourth user input via an input device. The pitch angle element 766b may include a slider bar with a slider, wherein a user may move the slider as a fourth user input via the input device to adjust the viewing angle. In some embodiments, the pitch interface element may be configured such that movement of the slider of the pitch element 766b may cause the value populated in the field of the pitch element 766a to automatically change to correspond to the position of the slider. Alternatively or additionally, the pitch interface element may be configured such that a change in the value within the field of the pitch element 766a may cause the slider of the pitch element 766b to move to a position corresponding to the value within the field.

The illustrated embodiment of the pitch interface element is merely an example. For example, in some embodiments, the pitch interface element may include only the pitch element 766a or only the pitch element 766 b. In these or other embodiments, the pitch angle interface element may comprise another type of interface element.

In some embodiments, the user interface 722 may include a "distance" interface element, which may correspond to a fifth interface element, which may be configured to receive a fifth user input regarding viewing distance, and may be configured to direct an adjustment of the viewing distance based on the fifth user input.

In the example shown, the distance interface elements may include distance element 768a and distance element 768 b. Distance element 768a may include a field in which a user may enter a number indicating a viewing distance as a fifth user input via an input device. Distance element 768b may include a slider with a slider, where a user may move the slider as a fifth user input via the input device to adjust the viewing distance. In some embodiments, the distance interface element may be configured such that movement of the slider of distance element 768b may cause the value populated within the field of distance element 768a to automatically change to correspond with the position of the slider. Alternatively or additionally, the distance interface element may be configured such that a change in a value within a field of distance element 768a may cause a slider of distance element 768b to move to a position corresponding to the value within the field.

The illustrated embodiment of the distance interface element is merely an example. For example, in some embodiments, the distance interface element may include only distance element 768a or only distance element 768 b. In these or other embodiments, the distance interface element may comprise another type of interface element.

In some embodiments, user interface 722 may include a "zoom" interface element, which may correspond to a sixth interface element, which may be configured to receive a sixth user input regarding the zoom ratio of the object, and may be configured to direct the adjustment of the zoom ratio based on the sixth user input.

In the example shown, the zoom scale interface element 772 may include a selection button that, in response to being selected, may toggle between a zoom object and a non-zoom object. The illustrated embodiment of the scaling interface element 772 is an example only. For example, in some embodiments, the zoom scale interface element may include a slider bar or a field that may allow the user to adjust the amount of zoom. In these or other embodiments, the scaling interface element may comprise another type of interface element.

In some embodiments, the user interface 722 may include a "speed" interface element, which may correspond to a seventh interface element, configured to receive a seventh user input regarding a speed for following the navigation route, and may be configured to direct an adjustment of the speed based on the seventh user input.

In the example shown, the speed interface elements may include a speed element 770a and a speed element 770 b. The speed element 770a may include a field where the user may enter a number indicating speed as a seventh user input via the input device. Velocity element 770b may comprise a slider bar with a slider, wherein a user may move the slider as a seventh user input via the input device to adjust the velocity. In some embodiments, the velocity interface element may be configured such that movement of the slider of velocity element 770b may cause the value populated in the field of velocity element 770a to automatically change to correspond to the position according to the slider. Alternatively or additionally, the velocity interface element may be configured such that a change in the value within a field of velocity element 770a may cause the slider of velocity element 770b to move to a position corresponding to the value within the field.

The illustrated embodiment of the speed interface element is merely an example. For example, in some embodiments, the speed interface elements may include only speed element 770a or only speed element 770 b. In these or other embodiments, the speed interface element may comprise another type of interface element.

In some embodiments, user interface 722 may include a "flight" interface element 780, which may correspond to an eighth interface element. As such, the flight interface element may be configured to receive an eighth user input regarding enablement of the flight mode, and may be configured to direct enablement or disablement of the flight mode based on the eighth user input.

In the example shown, the flight interface element 780 can include a selection button, wherein the selection button can switch between enabling and disabling the flight mode in response to being selected. The illustrated embodiment of the flight interface element 780 is merely an example. For example, the flight interface element may include another type of interface element.

In some embodiments, the user interface 722 may include an "air image" interface element 778, which may correspond to a ninth interface element, which may be configured to receive a ninth user input regarding an air image mode or map mode, and may be configured to direct the enabling or disabling of the air image mode or map mode based on the ninth user input.

In the example shown, the air image interface element 778 may include a selection button that is switchable between an air image mode and a map mode in response to being selected. The illustrated embodiment of the air image interface element 778 is merely an example. For example, the air image interface element may comprise another type of interface element.

In some embodiments, the user interface 722 may include a "reset defaults" interface element 774, which in response to being selected may restore one or more of the above parameters (e.g., FOV, IOD, screen Z, pitch, distance, speed, zoom, aerial image, flight, etc.) to default settings. In the example shown, reset defaults interface element 774 may include a select button that, in response to being selected, may reset the defaults. The illustrated embodiment of the reset default element 774 is merely an example. For example, the reset default interface element may include another type of interface element.

In some embodiments, user interface 722 may include a "confirm" interface element 776, which in response to being selected may save selected settings for the above-described parameters. Alternatively or additionally, in response to selecting confirmation interface element 776, the menu screen of user interface 722 may be exited. In the example shown, confirmation interface element 776 may comprise a selection button, wherein the selection button saves settings and exits the menu in response to being selected. The illustrated embodiment of the confirmation interface element 776 is merely an example. For example, the validation interface element may include another type of interface element.

In some embodiments, user interface 722 may include a "cancel" interface element 782, which cancel interface element 782 may cancel any changes that have been made to the selected settings for the above-described parameters in response to being selected. Alternatively or additionally, a menu screen of user interface 722 may be exited in response to selection of cancel interface element 782. In the example shown, cancel interface element 782 may include a selection button that may cancel any changes and may exit the menu in response to being selected. The illustrated embodiment of the cancel interface element is merely an example. For example, the cancel interface element may comprise another type of interface element.

Modifications, additions, or omissions may be made to fig. 7 without departing from the scope of the present disclosure. For example, the number and types of interface elements included in user interface 722 may vary. Further, although one or more features of the user interface 722 are described with respect to a mapping application and/or a detection and ranging application, the user interface 722 is not limited to only mapping applications and/or detection and ranging applications.

In accordance with common practice, the various features shown in the drawings may not be drawn to scale. The illustrations presented in this disclosure are not meant to be actual views of any particular apparatus (e.g., device, system, etc.) or method, but are merely idealized representations which are employed to describe various embodiments of the present disclosure. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may be simplified for clarity. Accordingly, the drawings may not depict all of the components of a given apparatus (e.g., device) or all of the operations of a particular method.

As used herein, the terms, especially in the claims (e.g., bodies of the claims) are generally intended as "open" terms (e.g., the term "including" should be interpreted as "including, but not limited to," the term "having" should be interpreted as "having at least," the term "includes" should be interpreted as "includes, but is not limited to," etc.).

Furthermore, if a specific number is intended to be introduced in the claim recitation, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of these phrases should not be construed to include: the introduction of claim recitations of indefinite articles limits any particular claim containing such introduction to embodiments containing only one such recitation, even if the same claim includes the introductory phrases "one or more" or "at least one" and "one" such indefinite articles (e.g., "a" should be interpreted to mean "at least one" or "one or more"); the same is true for the definite articles used to introduce claim recitations.

In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, means at least two recitations, or two or more recitations). Further, where a convention analogous to "at least one of A, B and C, etc." or "one or more of A, B and C, etc." is used, in general such a construction is intended to encompass a alone, B alone, C, A and B together, a and C together, B and C together, or A, B and C together, etc. For example, use of the term "and/or" is intended to be interpreted in this manner. In addition, the terms "about", "substantially" and "approximately" should be interpreted to mean a value within 10% of the actual value, such as, for example, a value of 3mm or 100% (percent).

Furthermore, any abstract word or phrase presenting two or more alternative items, whether in the specification, claims or drawings, should be understood as contemplating possibilities of including one of said items, any one of said items or both. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B" or "a and B".

However, the use of such phrases should not be construed to include: the recitation of a claim by the indefinite article "a" limits any particular claim containing such recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and "an" (e.g., "a" should be interpreted to mean "at least one" or "one or more"); the same is true for the definite articles used to introduce claim recitations.

In addition, the use of the terms first, second, third, etc. herein are not necessarily intended to denote a particular order or quantity of elements. In general, the terms "first," "second," "third," and the like are used to distinguish between different elements as a general identifier. If no terms "first," "second," "third," etc. denote a particular order, these terms should not be construed as indicating a particular order. Moreover, if the terms "first," "second," "third," etc. do not denote a particular number of elements, these terms should not be construed as indicating a particular number of elements. For example, a first widget may be described as having a first side and a second widget may be described as having a second side. Rather than representing the second widget as having two sides, the use of the term "second side" with respect to the second widget may distinguish this side of the second widget from the "first side" of the first widget.

All examples and conditional language recited herein are intended for pedagogical objects to aid the reader in understanding the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Although the embodiments of the present disclosure have been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the disclosure.

Claims

1. A method, comprising:

obtaining, via a detection application, a first digital image comprising a monoscopic image depicting a scene from a first location of a camera sensor communicatively coupled to the detection application;

generating, based on the first digital image, a second digital image of the monoscopic field and depicting the scene from a second location different from the first location; and

a stereoscopic image of the scene is generated that includes a first digital image and a second digital image.

2. The method of claim 1, further comprising:

acquiring a plurality of stereo pair images, wherein the stereo pair images comprise a first single-view image and a second single-view image;

sending the plurality of stereo-pair images as input into a graphics-based model.

3. The method of claim 2, further comprising:

sending the first digital image as input into the graphics-based model, wherein a second digital image is output from the graphics-based model based on the first digital image input into the graphics-based model and one or both of the plurality of stereo-pair images.

4. The method of claim 1, further comprising:

generating a depth map comprising a corresponding pixel of each pixel in the stereoscopic image, each corresponding pixel in the depth map representing relative distance data from the camera sensor for each respective pixel in the stereoscopic image.

5. The method of claim 4, further comprising:

associating a subset of pixels of a total amount of pixels in the depth map as a metaphor for an object;

based on a portion of a subset of pixels in the depth map associated with an object, obtaining an actual distance from the camera sensor to the object in the stereoscopic image using:

relative distance data of a portion associated with the object;

a focus of the first digital image and the second digital image; and

a displacement factor between the first digital image and the second digital image.

6. The method of claim 5, further comprising:

sending, via the detection application, an alert for presentation when an actual distance to an object satisfies a first threshold distance; alternatively, the first and second electrodes may be,

causing, via the detection application, a machine communicatively coupled to the detection application to perform a corrective action when an actual distance to an object satisfies a second threshold distance.

7. The method of claim 5, wherein the step of obtaining the actual distance to the object comprises determining a correction value that compensates for the offset in the distance measurement based on the perceived depth in the stereoscopic image.

8. The method of claim 1, further comprising:

determining a presence of an object within the stereoscopic image; and

the object is classified based on an image recognition process of the object via a graph-based model.

9. The method of claim 1, wherein the first digital image comprises a trajectory of the machine.

10. The method of claim 1, further comprising:

receiving, at a first interface element of a user interface, a first user input regarding a degree of rendered stereoscopic depth in the stereoscopic image;

adjusting the stereoscopic depth based on a first user input;

receiving a second user input regarding an adjustment of a z-plane position of the stereoscopic image at a second interface element of the user interface;

adjusting the z-plane position based on a second user input; and

generating the stereoscopic image based on the adjustment of the stereoscopic depth and the adjustment of the z-plane position.

11. The method of claim 10, further comprising:

receiving, at a third interface element of the user interface, a third user input regarding a field of view size of a scene depicted in the stereoscopic image;

adjusting the field of view size based on a third user input; and

generating the stereoscopic image based on the adjustment of the field of view size.

12. The method of claim 10, further comprising:

receiving, at a third interface element of the user interface, a third user input regarding a pitch angle depicting a scene depicted in the stereoscopic image;

adjusting the pitch angle based on a third user input; and

generating the stereoscopic image based on the adjustment of the pitch angle.

13. The method of claim 10, further comprising:

receiving a third user input regarding a distance of a viewpoint from which a scene depicted in the stereoscopic image is depicted at a third interface element of the user interface;

adjusting the distance based on a third user input; and

generating the stereoscopic image based on the adjustment of the distance.

14. The method of claim 10, further comprising:

receiving, at a third interface element of the user interface, a third user input regarding a scaling of an object of a scene depicted in the stereoscopic image;

adjusting the zoom scale based on a third user input; and

generating the stereoscopic image based on the adjustment of the scaling.

15. The method of claim 10, wherein the scene is a geographic scene and the stereoscopic image is one of a plurality of stereoscopic images of the geographic scene used in a mapping application.

16. The method of claim 15, further comprising:

receiving, at a third interface element of the user interface, a speed for simulating following a navigation route within the geographic scene presented in the plurality of stereoscopic images;

adjusting the speed based on a third user input; and

simulating a process of following a navigation route in the geographic scene based on the adjusted speed.

17. The method of claim 15, further comprising:

receiving, at a third interface element of the user interface, a third user input regarding a flight mode of a mapping application that simulates flying above the ground of a geographic scene presented in the plurality of stereoscopic images; and

the flight mode is enabled based on a third user input.

18. The method of claim 15, further comprising:

receiving, at a third interface element of the user interface, a third user input regarding an aerial image mode of a mapping application that depicts a stereoscopic view of a geographic scene based on one or more images of the geographic scene captured by the camera sensor; and

the aerial image mode is enabled based on a third user input.

19. The method of claim 1, wherein the step of generating a second digital image comprises capturing the second digital image via the camera sensor.

20. The method of claim 19, wherein the step of capturing a second digital image comprises moving the camera sensor from a first position to a second position.

21. A system, comprising:

a display;

a processor coupled to the display and configured to direct presentation of data on the display; and

at least one non-transitory computer-readable medium communicatively coupled to the processor and configured to store one or more instructions that, when executed by the processor, cause or direct a system to perform operations comprising:

obtaining, via a camera sensor associated with a machine, a first digital image comprising a monoscopic image depicting a first area of a scene from a first location of the camera sensor communicatively coupled to the machine;

22. The system of claim 21, wherein the operations further comprise:

23. The system of claim 22, wherein the operations further comprise:

relative distance data of a portion associated with the object;

a focus of the first digital image and the second digital image; and

24. The system of claim 23, wherein the operations further comprise:

sending a warning for presentation via the detection application onto the display when the actual distance to the object satisfies a first threshold distance; or

Causing, via the detection application, a machine communicatively coupled to the detection application to perform a corrective action when the actual distance to the object satisfies a second threshold distance.

25. The system of claim 23, wherein the step of obtaining the actual distance to the object comprises determining a correction value that compensates for the offset in the distance measurement based on the perceived depth in the stereoscopic image.

26. The system of claim 21, wherein the operations further comprise:

determining a presence of an object within the stereoscopic image; and is

27. The system of claim 21, wherein the first digital image comprises a trajectory of the machine.

28. The system of claim 21, wherein the operations further comprise:

29. The system of claim 28, wherein the operations further comprise:

sending a first digital image as input into the graphics-based model, wherein the second digital image is output from the graphics-based model based on the first digital image input into the graphics-based model and one or both of the plurality of stereo-pair images.

30. The system of claim 21, wherein the display comprises a user interface configured to control adjustment of the stereoscopic image, the user interface comprising:

a first interface element configured to:

receiving a first user input regarding a degree of rendered stereoscopic depth in the stereoscopic image; and is

Directing adjustment of the stereoscopic depth based on a first user input;

a second interface element configured to:

receiving a second user input regarding an adjustment of a z-plane position of the stereoscopic image; and is

Directing an adjustment of the z-plane position based on a second user input.

31. The system of claim 30, further comprising:

a third interface element configured to:

receiving a third user input regarding a field of view size of a scene depicted in the stereoscopic image; and

directing an adjustment of a size of the field of view based on a third user input.

32. The system of claim 30, further comprising:

a third interface element configured to:

receiving a third user input regarding a pitch angle describing a scene depicted in the stereoscopic images; and

directing an adjustment of the pitch angle based on a third user input.

33. The system of claim 30, further comprising:

a third interface element configured to:

receiving a third user input regarding a distance of a viewpoint from which a scene depicted in the stereoscopic image is depicted; and

directing an adjustment of the distance based on a third user input.

34. The system of claim 30, further comprising:

a third interface element configured to:

receiving a third user input regarding a scaling of an object of a scene depicted in the stereoscopic image; and

directing an adjustment of the zoom scale based on a third user input.

35. The system of claim 30, wherein the scene is a geographic scene and the stereoscopic image is one of a plurality of stereoscopic images of the geographic scene used in a mapping application.

36. The system of claim 35, further comprising:

a third interface element configured to:

receiving a third user input regarding a speed at which to simulate following a navigation route within the geographic scene presented in the plurality of stereoscopic images; and

directing an adjustment of the speed based on a third user input.

37. The system of claim 35, further comprising:

a third interface element configured to:

receiving a third user input regarding a flight mode of a mapping application, wherein the flight mode simulates flying above the ground of a geographic scene presented in the plurality of stereoscopic images; and

directing enablement of the flight mode based on a third user input.

38. The system of claim 35, further comprising:

a third interface element configured to:

receiving a third user input regarding an aerial image mode of a mapping application, wherein the aerial image mode depicts a stereoscopic view of a geographic scene based on one or more images of the geographic scene captured by the camera sensor; and

directing enablement of the aerial image mode based on a third user input.

39. The system of claim 21 wherein the operation of generating a second digital image comprises capturing the second digital image via the camera sensor.

40. The system of claim 39, wherein the step of capturing a second digital image comprises moving the camera sensor from the first position to the second position.