US11004233B1 - Intelligent vision-based detection and ranging system and method - Google Patents

Intelligent vision-based detection and ranging system and method Download PDF

Info

Publication number
US11004233B1
US11004233B1 US16/864,914 US202016864914A US11004233B1 US 11004233 B1 US11004233 B1 US 11004233B1 US 202016864914 A US202016864914 A US 202016864914A US 11004233 B1 US11004233 B1 US 11004233B1
Authority
US
United States
Prior art keywords
vision system
neural network
stereo vision
view
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/864,914
Inventor
Ynjiun Paul Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US16/864,914 priority Critical patent/US11004233B1/en
Application granted granted Critical
Publication of US11004233B1 publication Critical patent/US11004233B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • G06K9/00798
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Definitions

  • image processing of road scenes can provide rich information about the road environment that other sensors (such as radar or lidar) might fail to obtain.
  • sensors such as radar or lidar
  • a camera it is relatively easy to measure a cross-track (lateral) distance given adequate camera resolution.
  • a current camera system with vision processing is less effective for an along-track (longitudinal) distance measurement of an object.
  • a monocular system uses only one camera and exploits the geometry of the road scene, along with the knowledge about the size of the cars, for example, to estimate the along-track distance.
  • a stereo vision-based system uses two cameras to estimate the 3D coordinates of an object directly by computing the epipolar line disparity similar to how a human visualize the distance through two eyes.
  • sensing ranges from up to 3 m to up to 80 m and a field of view (FOV) from around 20 degree to 180 degree depending on the sensing range (distance).
  • FOV field of view
  • the depth capability rapidly drops with distance and limited to a range of up to around 40 m.
  • Another limitation of current vision-based detection and ranging system is the accuracy: while radar and lidar have rather constant ranging errors over distance, the range accuracy of camera systems typically decreases quadratically with distance. For example, at 3 m, a range error of 5 cm is typically achievable, while the error at 40 m might grow rapidly up to around 3 m depends on the camera focal length and pixel size.
  • a vehicle management system consists of at least a first, a second, and a third video camera.
  • the system is configured to receive input from a combination of at least two cameras selected from among the first, second, and third cameras based on an event associated with a vehicle to provide at least a first, a second, and a third stereo ranging depth.
  • This disclosure tries to use the different baseline of stereo pair to extend the sensing range.
  • all three cameras have the same FOV (or focal length) thus limit their range extension up to the longest baseline to shortest baseline ratio. For example, in this disclosure, the longest baseline is 1.5 m and the shortest baseline is 0.5 m. Therefore, the distance (range) extension can only be up to 3 ⁇ .
  • an intelligent vision-based detection and ranging (iVidar) system consists of at least four cameras to form at least two stereo vision systems.
  • the first pair of two cameras forms a first stereo vision system with a field of view A1 at infinity and has a baseline B1.
  • the second pair of two cameras forms a second stereo vision system with the a field of view A2 at infinity and has a baseline B2. Where B2 is greater than B1 and A2 is smaller than A1.
  • the field of view A1 of the first stereo vision system may have an overlap area with the field of view A2 of the second stereo vision system.
  • an iVidar system may further consist of at least one monocular vision system with field of view A3.
  • the field of view A3 may have an overlap area with field of view A1 and A2.
  • One of the preferred configurations for the current invention iVidar is to mount the second stereo vision system field of view A2 to cover a perspective vanishing point in the along-track direction.
  • the baseline B2 at least 2 ⁇ or greater than B1 and the field of view A1 at lease 3 ⁇ or greater than A2. This results in that the second stereo vision system has at least 6 ⁇ or greater extended detection and sensing range than the first stereo vision system.
  • the method of computing a high precision distance estimate at extended range using an iVidar includes: estimating an object distance in a field of view A2, estimating an object distance in a field of view A3 and estimating an object distance in a field of view A1.
  • the method of estimating an object distance in a field of view A2 using the second stereo vision system includes: computing epipolar line feature disparity, detecting an object O2 with artificial intelligent object detection neural network, registering object O2 along-track distance with a cluster of feature disparity on the boundary of the detected object O2 and compute the distance estimate based on disparity map and perspective location, outputting at least one of a disparity map, a pixel-based distance map, point clouds, or cuboid coordinates C2 of the object O2.
  • the method of estimating an object distance in a field of view A3 using the monocular vision system includes: detecting an object O3 with artificial intelligent object detection neural network, associating the object O3 with O2 detected by the second stereo vision system if O3 is also in the field of view A2, tracking the object O3 the size change and perspective location to compute the distance estimate if O3 is outside the FOV A2, outputting at least one of a disparity map, a pixel-based distance map, point clouds, or cuboid coordinates C3 of the object O3.
  • the method of estimating an object distance in a field of view A1 using the first stereo vision system includes: computing epipolar line feature disparity, detecting an object O1 with artificial intelligent object detection neural network, associating the object O1 with O2 detected by the second stereo vision system if O1 is also in the field of view A2, associating the object O1 with O3 detected by the monocular vision system if O1 is also in the field of view A3, registering object O1 along-track distance with a cluster of feature disparity on the boundary of the detected object O1 and compute the distance estimate based on disparity map and perspective location, estimating the object O1 distance as the maximal confidence distance of among the first stereo vision system estimate, the second stereo vision system estimate and the monocular vision system estimate, outputting at least one of a disparity map, a pixel-based distance map, point clouds, or cuboid coordinates C1 of the object O1.
  • FIG. 1 illustrates one embodiment of intelligent vision-based detection and ranging system including two stereo vision systems, a monocular vision system, a processor, a memory and a network interface.
  • FIG. 2 illustrates the horizontal field of views of two stereo vision systems and a monocular vision system.
  • FIG. 3 illustrates a perspective view of field of views of two stereo vision systems and a monocular vision system.
  • FIG. 4 illustrates a flow chart of an intelligent vision-based detection and ranging method.
  • FIG. 5 illustrates another flow chart of an intelligent vision-based detection and ranging method.
  • FIG. 1 is a schematic block diagram illustrating an example intelligent vision-based detection and ranging system 100 .
  • the system 100 includes five cameras 102 , 104 , 106 , 108 , 110 , a processor 130 , a memory 140 , and a network interface 150 .
  • the system 100 are in communication with a car or robot navigation system (not shown in FIG. 1 ) via the network interface 150 using wired and/or wireless communication schemes.
  • the processor 130 is an Intel Movidius Myriad X VPU with a Neural Computing Engine in conjunction with 16 cores. It can connect up to 8 cameras to the VPU directly.
  • the processor 130 could be an embedded AI computing system such as Nvidia AGX Xavier that consists of 512-core GPU with 64 tensor cores and 8-core Arm CPU.
  • the camera 102 and camera 104 connected to a processor 130 and a memory 140 form a first stereo vision system.
  • the camera 106 and camera 108 connected to a processor 130 and a memory 140 form a second stereo vision system.
  • the camera 110 connected to a processor 130 and a memory 140 forms a monocular vision system.
  • FIG. 2 illustrates an example configuration of horizontal field of views of a first stereo vision system, a second stereo vision system and a monocular vision system.
  • the field of views 202 , 204 of a first stereo vision system of camera 102 and camera 104 with the same focal length are merged into the same field of view at infinity and designated as A1.
  • the distance between the center of the field of view 202 of the camera 102 and the center of the field of view 204 of the camera 104 is called baseline 120 and designated as B1.
  • the field of views 206 , 208 of a second stereo vision system of camera 106 and camera 108 with the same focal length are merged into the same field of view at infinity and designated as A2.
  • the distance between the center of the field of view 206 of the camera 106 and the center of the field of view 208 of the camera 108 is called baseline 122 and designated as B2.
  • the field of view 210 of the monocular vision system is designated as A3.
  • a first stereo vision system field of view A1 is greater than a second stereo vision system field of view A2.
  • a second stereo vision system baseline B2 is greater than a first stereo vision system base line B1.
  • A1 may have an overlap area with A2.
  • A3 may have an overlap area with at least one of A1 and A2.
  • both camera 102 and camera 104 are using OmniVision OV9282 global shutter monochrome CMOS sensors with 1280 ⁇ 800 resolution.
  • the field of view A1 is 120 degree horizontally and 80 degree vertically.
  • the baseline B1 between camera 102 and camera 104 is 10 cm.
  • both camera 106 and camera 108 are also using OmniVision OV9282 global shutter monochrome CMOS sensors with 1280 ⁇ 800 resolution.
  • the field of view A2 is 34 degree horizontally and 22.7 degree vertically.
  • the baseline B2 between camera 106 and camera 108 is 25 cm.
  • camera 110 is using Sony's IMX464 rolling shutter color CMOS sensor with 2688 ⁇ 1520 resolution.
  • the field of view A3 is 87 degree horizontally and 58 degree vertically.
  • both first stereo vision system cameras and second stereo vision system cameras are using extremely low light sensitive Sony's IMX482 with 1920 ⁇ 1080 resolution. It can achieve 0.08 lux low light SNR to achieve better night time detection capability. And for monocular vision system camera, the Sony's IMX485 to achieve even higher resolution of 3840 ⁇ 2160 to further improve the object distance estimate accuracy.
  • FIG. 3 illustrates a perspective view of field of views of two stereo vision systems and a monocular vision system.
  • the horizontal line 312 is the horizon of the perspective view.
  • the point 310 is the vanishing point of the perspective view.
  • the baseline B1 is negligible thus field of views 202 and 204 merge into a single field of view 302 A1 in perspective view.
  • the baseline B2 is negligible thus field of views 206 and 208 merge into a single field of view 304 A2.
  • the perspective view of field of view A3 is 306 .
  • a second stereo vision system field of view A2 304 center should align at the perspective view vanishing point 310 .
  • a monocular vision system field of view A3 306 center should locate within the field of view A2 304 .
  • a first stereo vision system field of view A1 302 center should also locate within the field of view A2 304 .
  • all field of views A1 302 , A2 304 , A3 306 centers should align at the perspective view vanishing point 310 .
  • FIG. 4 illustrates a flow chart of steps of an intelligent vision-based detection and ranging method.
  • the steps 402 , 404 and 406 are parallel steps of processing the synchronous image frames from a first stereo vision system, a second stereo vision system and a monocular vision system. All the steps 402 , 404 , 406 can be processed in parallel in a processor 130 with multi-core CPUs and a multi-tasking operating system.
  • step 402 it includes step 402 . 2 to compute epipolar line stereo disparity of a first stereo vision system between a camera 102 image frame and a camera 104 image frame. Both image frames are synchronized with the same capture time and the same exposure control parameters. In one of the preferred embodiments, both image frames can be rectified and performed feature extraction such as edge detection before computing stereo disparity.
  • An example disparity estimate neural network algorithms is called “Pyramid stereo matching network” published in Conference on Computer Vision and Pattern Recognition (CVPR), 2018 by J. R. Chang and Y. S. Chen.
  • the resulting disparity map can be further refined to achieve sub-pixel accuracy using parabola curve fitting to a local neighborhood of global maximal correlation exemplified by a paper titled “High Accuracy Stereovision Approach for Obstacle Detection on Non-Planar Roads” by Nedevschi, S. et al. in Proceedings of the IEEE Intelligent Engineering Systems (INES), 19-21 Sep. 2004, pp 211-216.
  • step 402 it also includes step 402 . 4 to detect objects in the image frame captured by a camera 102 and/or in the image frame captured by a camera 104 . If resources allowed, step 402 . 2 and step 402 . 4 can be performed in parallel without depending to each other in terms of processing order.
  • An example object detection neural network algorithms is called YOLO (You only look once) first published in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on pages 6517-6525, by J. Redmon and A. Farhadi, titled: “Yolo9000: Better, faster, stronger”.
  • a YOLOv3 is used to achieve real-time object detection at frame rate about 30 frames per second (fps).
  • Other object detection neural network algorithms can be used as well, for example, MobileNets-SSD, published by Andrew G. Howard, et al. at Cornell University, arXiv:1704.04861[cs.CV], on Apr. 17, 2017.
  • step 402 it includes step 402 . 6 which follows steps 402 . 2 and 402 . 4 .
  • step 402 . 6 the object detected in a bounding box in step 402 . 4 will be registered with the disparity estimated in step 402 . 2 .
  • a cuboid coordinates or the point clouds will be computed by using the disparity map established in step 402 . 2 within the bounding box as depicted in a paper “Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving” by Yan Wang, et al. at Cornell University, arXiv:1812.07179v5 [cs.CV], on Jun. 14, 2019.
  • the disparity map from 402.2 can be combined with synchronized frame images from camera 102 and camera 104 and then feed into a 3D object detection neural network 502 . 6 .
  • the 3D object detection neural network 502 . 6 can output a registered object distance directly. Thus saving the step of conventional 2D object detection 402 . 4 .
  • the 3D object detection neural network algorithms 502 . 6 is an extension of a conventional 2D Single Shot MultiBox Detector (SSD).
  • SSD Single Shot MultiBox Detector
  • An example of SSD Neural Network is published by Wei Liu, et al. at Cornell University, arXiv:1512.02325v5 [cs.CV], on Dec. 29, 2016.
  • a 3D object detection neural network (3D-SSD) is trained by a set of 2D images with its associated stereo disparity map and 3D cuboid ground truth.
  • 3D-SSD adds one disparity channel in additional to original RGB channel (or gray level channel) for deep convolutional feature map extraction.
  • step 402 . 4 and step 402 . 6 in FIG. 4 into one step 502 . 6 in FIG. 5 further reduces the computation cost and the system latency which boosts up the overall frame rate performance of the current invention.
  • step 404 they are exactly the same step as 402 except processing the synchronized image frames from camera 106 and camera 108 of a second stereo vision system.
  • step 406 there is only one step 406 . 4 in a monocular vision system.
  • a monocular vision system detects an object using the same YOLOv3 algorithms, for example.
  • all cameras 102 , 104 , 106 , 108 and 110 are synchronized in capture time. Thus the same object can be associated and tracked among different cameras.
  • step 408 it associates the objects detected in steps 402 . 4 , 404 . 4 , 406 . 4 in FIG. 4 or 502.6, 504.6, 406.4 in FIG. 5 .
  • the associated objects thus are assigned the same label for the same object.
  • an object detected by a second stereo vision system in step 404 . 4 in the field of view 304 located at the same perspective location of an object detected by a monocular vision system in step 406 . 4 in the field of view 306 are assigned the same label for the same object to be processed in the next step.
  • step 410 it tracks each object from previous frame to the current frame and updates various parameters such as the object distance, velocity, cuboid coordinates, etc.
  • each object might have 3 distance estimates from a first stereo vision system, a second stereo vision system and a monocular vision system. At any particular frame, not all vision system has valid estimate.
  • an object M might have 3 distance estimates forming as a vector of (inf, 157 m, N/A) where 1) the inf means the distance beyond the range of a first stereo vision system which covers from 0.1 m to 30 m and any object at >30 m away will be considered as distance at infinity; 2) 157 m is the distance estimate for the same object by a second stereo vision system which can cover range from 1 m to 200 m and any object at >200 m away will be considered as distance at infinity; 3) N/A means distance unknown in a monocular vision system. This object might be the first time show up in frame N and therefore there is no tracking record to estimate the size change compared with previous frame N ⁇ 1 thus no distance estimate from a monocular vision system.
  • the same object distance estimates might look like (inf, 156 m, 156.2 m).
  • the object distance can be estimated from a monocular vision system by perspective view location in a 2D image with a level road and level vehicle pitch assumption. Thus an object first time show up in frame N in a monocular vision system, the distance can be estimated.
  • the cuboid coordinates of an object can be updated by the similar method as the distance estimate. For example, at frame N, three cuboid coordinates (C1, C2, C3) are estimated from a first stereo vision system, a second stereo vision system and a monocular vision system. Likewise, the possible value for cuboids including inf and N/A which stands for infinity and not available. Apply the same method as described in distance estimate, the cuboid coordinates are updated at each frame.
  • step 412 it can be configured to output at least one of the following information: disparity map, point clouds, cuboids.
  • the disparity map is generated from a first stereo vision system and a second stereo vision system.
  • the point clouds are computed from disparity map.
  • the cuboids are computed from point clouds.
  • An example formula of computing point clouds from disparity map is shown in a paper “Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving” by Yan Wang, et al. at Cornell University, arXiv:1812.07179v5 [cs.CV], on Jun. 14, 2019.
  • An example algorithms of grouping point clouds to cuboids is illustrated in International Conference on Image and Graphics paper by Chengkun Cao, et al. Nov. 28, 2019, pp 179-190. If multiple estimates are available for disparity map, point clouds and cuboids for the same objects. The maximal confidence estimates are output.
  • an object distance estimate (inf, 157 m, N/A) results in maximal confidence estimate of 157 m.
  • the average of the valid values is the maximal confidence estimate, for example, (inf, 156 m, 156.2 m) results in maximal confidence estimate of 156.1 m. If each value has confidence score, then the maximal confidence estimate can be the confidence weighted average.
  • a disparity map can be converted into a pixel-based distance map.

Abstract

An intelligent vision-based detection and ranging (iVidar) system consists of at least four cameras to form at least two stereo vision systems. The first pair of two cameras forms a first stereo vision system with a field of view A1 and has a baseline B1. The second pair of two cameras forms a second stereo vision system with a field of view A2 and has a baseline B2. Where B2 is greater than B1 and A2 is smaller than A1. One of the preferred configurations for the current invention is to mount the second stereo vision system field of view A2 to cover a perspective vanishing point in the along-track direction.

Description

BACKGROUND
For automotive vision sensors, image processing of road scenes can provide rich information about the road environment that other sensors (such as radar or lidar) might fail to obtain. Today, cameras are used in certain vehicles for lane detection and offer lane keeping assistance or lane departure warning systems. Furthermore, some vehicles already provide automatic traffic sign recognition systems that can inform a driver about the speed limit or other type of road conditions. Further on, with recent artificial intelligence advancements, cameras are able to detect pedestrians or cyclists, which would otherwise fail with radar sensors or lidar sensors.
Generally, for a camera, it is relatively easy to measure a cross-track (lateral) distance given adequate camera resolution. However, a current camera system with vision processing is less effective for an along-track (longitudinal) distance measurement of an object. A monocular system uses only one camera and exploits the geometry of the road scene, along with the knowledge about the size of the cars, for example, to estimate the along-track distance. On the other hand, a stereo vision-based system uses two cameras to estimate the 3D coordinates of an object directly by computing the epipolar line disparity similar to how a human visualize the distance through two eyes.
Current various vision-based camera systems have sensing ranges (distances) from up to 3 m to up to 80 m and a field of view (FOV) from around 20 degree to 180 degree depending on the sensing range (distance). Usually the wider is the FOV, the shorter is the sensing range (distance). In a stereo vision-based system, the depth capability rapidly drops with distance and limited to a range of up to around 40 m. Another limitation of current vision-based detection and ranging system is the accuracy: while radar and lidar have rather constant ranging errors over distance, the range accuracy of camera systems typically decreases quadratically with distance. For example, at 3 m, a range error of 5 cm is typically achievable, while the error at 40 m might grow rapidly up to around 3 m depends on the camera focal length and pixel size.
In a prior art disclosed by Adomat et al. in U.S. Pat. No. 10,334,234, it described a “stereo” system with two monocular cameras for a vehicle with two different field of views (FOV) in order to extend the sensing range. One camera has a smaller FOV to provide more resolution at far distance sensing and serves as a fixed zoom camera. While the other camera is with larger FOV to provide more coverage yet for shorter distance sensing. However, this embodiment is fundamentally not a stereo vision-based system. A stereo vision-based system requires both cameras cover more or less the same (overlap) FOV and the same pixel resolution to enable a stereo pair epipolar line disparity computation. In this disclosure, only the center portion of the field of view forms an overlap area of the two cameras FOVs. If stereo pair epipolar line disparity computation needs to be performed in this overlap area, it will also require to reduce the zoom camera (smaller FOV) pixel resolution to the same lower resolution of the other camera thus rendering the zoom effect in void. In summary, if this disclosed configuration is used for “stereo” vision, it does not extend the sensing range beyond the shorter distance (larger FOV) sensing camera.
In another prior art disclosed by Schmiedel et al. in U.S. Pat. No. 8,139,109, it described a vehicle management system consists of at least a first, a second, and a third video camera. The system is configured to receive input from a combination of at least two cameras selected from among the first, second, and third cameras based on an event associated with a vehicle to provide at least a first, a second, and a third stereo ranging depth. This disclosure tries to use the different baseline of stereo pair to extend the sensing range. However all three cameras have the same FOV (or focal length) thus limit their range extension up to the longest baseline to shortest baseline ratio. For example, in this disclosure, the longest baseline is 1.5 m and the shortest baseline is 0.5 m. Therefore, the distance (range) extension can only be up to 3×.
SUMMARY OF THE INVENTION
In one aspect, an intelligent vision-based detection and ranging (iVidar) system consists of at least four cameras to form at least two stereo vision systems. The first pair of two cameras forms a first stereo vision system with a field of view A1 at infinity and has a baseline B1. The second pair of two cameras forms a second stereo vision system with the a field of view A2 at infinity and has a baseline B2. Where B2 is greater than B1 and A2 is smaller than A1.
Also the field of view A1 of the first stereo vision system may have an overlap area with the field of view A2 of the second stereo vision system.
Additionally, an iVidar system may further consist of at least one monocular vision system with field of view A3. The field of view A3 may have an overlap area with field of view A1 and A2.
One of the preferred configurations for the current invention iVidar is to mount the second stereo vision system field of view A2 to cover a perspective vanishing point in the along-track direction.
Furthermore, to configure the baseline B2 at least 2× or greater than B1 and the field of view A1 at lease 3× or greater than A2. This results in that the second stereo vision system has at least 6× or greater extended detection and sensing range than the first stereo vision system.
The method of computing a high precision distance estimate at extended range using an iVidar includes: estimating an object distance in a field of view A2, estimating an object distance in a field of view A3 and estimating an object distance in a field of view A1.
The method of estimating an object distance in a field of view A2 using the second stereo vision system includes: computing epipolar line feature disparity, detecting an object O2 with artificial intelligent object detection neural network, registering object O2 along-track distance with a cluster of feature disparity on the boundary of the detected object O2 and compute the distance estimate based on disparity map and perspective location, outputting at least one of a disparity map, a pixel-based distance map, point clouds, or cuboid coordinates C2 of the object O2.
The method of estimating an object distance in a field of view A3 using the monocular vision system includes: detecting an object O3 with artificial intelligent object detection neural network, associating the object O3 with O2 detected by the second stereo vision system if O3 is also in the field of view A2, tracking the object O3 the size change and perspective location to compute the distance estimate if O3 is outside the FOV A2, outputting at least one of a disparity map, a pixel-based distance map, point clouds, or cuboid coordinates C3 of the object O3.
The method of estimating an object distance in a field of view A1 using the first stereo vision system includes: computing epipolar line feature disparity, detecting an object O1 with artificial intelligent object detection neural network, associating the object O1 with O2 detected by the second stereo vision system if O1 is also in the field of view A2, associating the object O1 with O3 detected by the monocular vision system if O1 is also in the field of view A3, registering object O1 along-track distance with a cluster of feature disparity on the boundary of the detected object O1 and compute the distance estimate based on disparity map and perspective location, estimating the object O1 distance as the maximal confidence distance of among the first stereo vision system estimate, the second stereo vision system estimate and the monocular vision system estimate, outputting at least one of a disparity map, a pixel-based distance map, point clouds, or cuboid coordinates C1 of the object O1.
DESCRIPTION OF THE FIGURES
FIG. 1 illustrates one embodiment of intelligent vision-based detection and ranging system including two stereo vision systems, a monocular vision system, a processor, a memory and a network interface.
FIG. 2 illustrates the horizontal field of views of two stereo vision systems and a monocular vision system.
FIG. 3 illustrates a perspective view of field of views of two stereo vision systems and a monocular vision system.
FIG. 4 illustrates a flow chart of an intelligent vision-based detection and ranging method.
FIG. 5 illustrates another flow chart of an intelligent vision-based detection and ranging method.
DETAILED DESCRIPTION
FIG. 1 is a schematic block diagram illustrating an example intelligent vision-based detection and ranging system 100. In this example, the system 100 includes five cameras 102, 104, 106, 108, 110, a processor 130, a memory 140, and a network interface 150. In the example system, the system 100 are in communication with a car or robot navigation system (not shown in FIG. 1) via the network interface 150 using wired and/or wireless communication schemes.
In one of the preferred embodiments, the processor 130 is an Intel Movidius Myriad X VPU with a Neural Computing Engine in conjunction with 16 cores. It can connect up to 8 cameras to the VPU directly. Another embodiment, the processor 130 could be an embedded AI computing system such as Nvidia AGX Xavier that consists of 512-core GPU with 64 tensor cores and 8-core Arm CPU.
The camera 102 and camera 104 connected to a processor 130 and a memory 140 form a first stereo vision system. The camera 106 and camera 108 connected to a processor 130 and a memory 140 form a second stereo vision system. In additionally, the camera 110 connected to a processor 130 and a memory 140 forms a monocular vision system.
FIG. 2 illustrates an example configuration of horizontal field of views of a first stereo vision system, a second stereo vision system and a monocular vision system. The field of views 202, 204 of a first stereo vision system of camera 102 and camera 104 with the same focal length are merged into the same field of view at infinity and designated as A1. The distance between the center of the field of view 202 of the camera 102 and the center of the field of view 204 of the camera 104 is called baseline 120 and designated as B1. Similarly, the field of views 206, 208 of a second stereo vision system of camera 106 and camera 108 with the same focal length are merged into the same field of view at infinity and designated as A2. The distance between the center of the field of view 206 of the camera 106 and the center of the field of view 208 of the camera 108 is called baseline 122 and designated as B2. Additionally, the field of view 210 of the monocular vision system is designated as A3.
In one of the preferred configurations, a first stereo vision system field of view A1 is greater than a second stereo vision system field of view A2. A second stereo vision system baseline B2 is greater than a first stereo vision system base line B1. Furthermore, A1 may have an overlap area with A2. And A3 may have an overlap area with at least one of A1 and A2.
In one embodiment, in the first stereo vision system, both camera 102 and camera 104 are using OmniVision OV9282 global shutter monochrome CMOS sensors with 1280×800 resolution. The field of view A1 is 120 degree horizontally and 80 degree vertically. And the baseline B1 between camera 102 and camera 104 is 10 cm. In the second stereo vision system, both camera 106 and camera 108 are also using OmniVision OV9282 global shutter monochrome CMOS sensors with 1280×800 resolution. However, the field of view A2 is 34 degree horizontally and 22.7 degree vertically. And the baseline B2 between camera 106 and camera 108 is 25 cm. In the monocular vision system, camera 110 is using Sony's IMX464 rolling shutter color CMOS sensor with 2688×1520 resolution. The field of view A3 is 87 degree horizontally and 58 degree vertically.
In another embodiment, both first stereo vision system cameras and second stereo vision system cameras are using extremely low light sensitive Sony's IMX482 with 1920×1080 resolution. It can achieve 0.08 lux low light SNR to achieve better night time detection capability. And for monocular vision system camera, the Sony's IMX485 to achieve even higher resolution of 3840×2160 to further improve the object distance estimate accuracy.
FIG. 3 illustrates a perspective view of field of views of two stereo vision systems and a monocular vision system. The horizontal line 312 is the horizon of the perspective view. The point 310 is the vanishing point of the perspective view. At infinity, the baseline B1 is negligible thus field of views 202 and 204 merge into a single field of view 302 A1 in perspective view. Similarly, at infinity, the baseline B2 is negligible thus field of views 206 and 208 merge into a single field of view 304 A2. The perspective view of field of view A3 is 306.
In one of the preferred embodiments, a second stereo vision system field of view A2 304 center should align at the perspective view vanishing point 310. A monocular vision system field of view A3 306 center should locate within the field of view A2 304. Likewise, A first stereo vision system field of view A1 302 center should also locate within the field of view A2 304.
In yet another preferred embodiments, all field of views A1 302, A2 304, A3 306 centers should align at the perspective view vanishing point 310.
FIG. 4 illustrates a flow chart of steps of an intelligent vision-based detection and ranging method. The steps 402, 404 and 406 are parallel steps of processing the synchronous image frames from a first stereo vision system, a second stereo vision system and a monocular vision system. All the steps 402, 404, 406 can be processed in parallel in a processor 130 with multi-core CPUs and a multi-tasking operating system.
In step 402, it includes step 402.2 to compute epipolar line stereo disparity of a first stereo vision system between a camera 102 image frame and a camera 104 image frame. Both image frames are synchronized with the same capture time and the same exposure control parameters. In one of the preferred embodiments, both image frames can be rectified and performed feature extraction such as edge detection before computing stereo disparity. An example disparity estimate neural network algorithms is called “Pyramid stereo matching network” published in Conference on Computer Vision and Pattern Recognition (CVPR), 2018 by J. R. Chang and Y. S. Chen. Furthermore, the resulting disparity map can be further refined to achieve sub-pixel accuracy using parabola curve fitting to a local neighborhood of global maximal correlation exemplified by a paper titled “High Accuracy Stereovision Approach for Obstacle Detection on Non-Planar Roads” by Nedevschi, S. et al. in Proceedings of the IEEE Intelligent Engineering Systems (INES), 19-21 Sep. 2004, pp 211-216.
In step 402, it also includes step 402.4 to detect objects in the image frame captured by a camera 102 and/or in the image frame captured by a camera 104. If resources allowed, step 402.2 and step 402.4 can be performed in parallel without depending to each other in terms of processing order. An example object detection neural network algorithms is called YOLO (You only look once) first published in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on pages 6517-6525, by J. Redmon and A. Farhadi, titled: “Yolo9000: Better, faster, stronger”. In a preferred embodiment, a YOLOv3 is used to achieve real-time object detection at frame rate about 30 frames per second (fps). Other object detection neural network algorithms can be used as well, for example, MobileNets-SSD, published by Andrew G. Howard, et al. at Cornell University, arXiv:1704.04861[cs.CV], on Apr. 17, 2017.
In step 402, it includes step 402.6 which follows steps 402.2 and 402.4. In step 402.6, the object detected in a bounding box in step 402.4 will be registered with the disparity estimated in step 402.2. Typically, for example, if the object is a car, then a cuboid coordinates or the point clouds will be computed by using the disparity map established in step 402.2 within the bounding box as depicted in a paper “Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving” by Yan Wang, et al. at Cornell University, arXiv:1812.07179v5 [cs.CV], on Jun. 14, 2019.
In another approach as shown in the FIG. 5, the disparity map from 402.2 can be combined with synchronized frame images from camera 102 and camera 104 and then feed into a 3D object detection neural network 502.6. The 3D object detection neural network 502.6 can output a registered object distance directly. Thus saving the step of conventional 2D object detection 402.4.
The 3D object detection neural network algorithms 502.6 is an extension of a conventional 2D Single Shot MultiBox Detector (SSD). An example of SSD Neural Network is published by Wei Liu, et al. at Cornell University, arXiv:1512.02325v5 [cs.CV], on Dec. 29, 2016. Instead of trained by a set of 2D images and 2D bounding boxes ground truth, a 3D object detection neural network (3D-SSD) is trained by a set of 2D images with its associated stereo disparity map and 3D cuboid ground truth. Architecturally, 3D-SSD adds one disparity channel in additional to original RGB channel (or gray level channel) for deep convolutional feature map extraction. Then it extends 2D anchors (cx,cy,width,height) target bounding boxes into 3D anchors (cx,cy,cz,width,height,depth) target cuboids. During training, 3D-SSD will use cuboid loss function to learn to adjust (dx,dy,dz,dw,dh,dd) to match the ground truth cuboid. The registered object distance therefore is cz+dz. This combination of step 402.4 and step 402.6 in FIG. 4 into one step 502.6 in FIG. 5 further reduces the computation cost and the system latency which boosts up the overall frame rate performance of the current invention.
Refers to FIGS. 4 and 5. In the concurrent step 404, they are exactly the same step as 402 except processing the synchronized image frames from camera 106 and camera 108 of a second stereo vision system.
In the concurrent step 406, there is only one step 406.4 in a monocular vision system. In step 406.4, a monocular vision system detects an object using the same YOLOv3 algorithms, for example. In one of the preferred embodiments, all cameras 102, 104, 106, 108 and 110 are synchronized in capture time. Thus the same object can be associated and tracked among different cameras.
In step 408, it associates the objects detected in steps 402.4, 404.4, 406.4 in FIG. 4 or 502.6, 504.6, 406.4 in FIG. 5. The associated objects thus are assigned the same label for the same object. For example an object detected by a second stereo vision system in step 404.4 in the field of view 304 located at the same perspective location of an object detected by a monocular vision system in step 406.4 in the field of view 306 are assigned the same label for the same object to be processed in the next step.
In step 410, it tracks each object from previous frame to the current frame and updates various parameters such as the object distance, velocity, cuboid coordinates, etc. For example, each object might have 3 distance estimates from a first stereo vision system, a second stereo vision system and a monocular vision system. At any particular frame, not all vision system has valid estimate. For example, at frame N, an object M might have 3 distance estimates forming as a vector of (inf, 157 m, N/A) where 1) the inf means the distance beyond the range of a first stereo vision system which covers from 0.1 m to 30 m and any object at >30 m away will be considered as distance at infinity; 2) 157 m is the distance estimate for the same object by a second stereo vision system which can cover range from 1 m to 200 m and any object at >200 m away will be considered as distance at infinity; 3) N/A means distance unknown in a monocular vision system. This object might be the first time show up in frame N and therefore there is no tracking record to estimate the size change compared with previous frame N−1 thus no distance estimate from a monocular vision system. As in frame N+1, the same object distance estimates might look like (inf, 156 m, 156.2 m). The monocular vision system computes the distance estimate of 156.2 m by using the previous frame N perspective object size change from current frame N+1 and previous best distance estimate 157 m for example. If the average of known distances is used as the best object distance estimate, then at frame N+1, the object distance estimate is 156.1 m (=average(156 m,156.2 m)). The velocity of the object at current frame might be updated as 156.1 m-157 m/33 ms/frame=−27.3 m/sec=−61 mile/hour approaching to the vehicle. In this example case, the vehicle has about 5.7 sec (=156.1 m/27.3 m/sec) to stop to avoid crashing into this object. In another embodiment, the object distance can be estimated from a monocular vision system by perspective view location in a 2D image with a level road and level vehicle pitch assumption. Thus an object first time show up in frame N in a monocular vision system, the distance can be estimated.
The cuboid coordinates of an object can be updated by the similar method as the distance estimate. For example, at frame N, three cuboid coordinates (C1, C2, C3) are estimated from a first stereo vision system, a second stereo vision system and a monocular vision system. Likewise, the possible value for cuboids including inf and N/A which stands for infinity and not available. Apply the same method as described in distance estimate, the cuboid coordinates are updated at each frame.
In step 412, it can be configured to output at least one of the following information: disparity map, point clouds, cuboids. The disparity map is generated from a first stereo vision system and a second stereo vision system. The point clouds are computed from disparity map.
The cuboids are computed from point clouds. An example formula of computing point clouds from disparity map is shown in a paper “Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving” by Yan Wang, et al. at Cornell University, arXiv:1812.07179v5 [cs.CV], on Jun. 14, 2019. An example algorithms of grouping point clouds to cuboids is illustrated in International Conference on Image and Graphics paper by Chengkun Cao, et al. Nov. 28, 2019, pp 179-190. If multiple estimates are available for disparity map, point clouds and cuboids for the same objects. The maximal confidence estimates are output. For example, for distance estimate, inf and N/A has no contribution to the distance estimate, thus an object distance estimate (inf, 157 m, N/A) results in maximal confidence estimate of 157 m. If more than one valid values are available, then the average of the valid values is the maximal confidence estimate, for example, (inf, 156 m, 156.2 m) results in maximal confidence estimate of 156.1 m. If each value has confidence score, then the maximal confidence estimate can be the confidence weighted average.
It is understood by those skilled in the art that by applying camera model information, a disparity map can be converted into a pixel-based distance map.
The flow diagrams depicted herein are just examples of current invention. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the disclosure. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified.
While embodiments have been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements can be made.

Claims (7)

What is claimed is:
1. An intelligent vision-based detection and ranging system, comprising:
a processor,
a first stereo vision system with a field of view A1 and a baseline B1,
wherein the first stereo vision system is configured to be communicatively coupled to the processor,
wherein the processor uses a left frame image and a right frame image from the first stereo vision system to detect a same object by using a 3D object detection neural network,
whereas the 3D object detection neural network is an extension of a 2D object detector by adding an additional RGB channel for deep convolutional feature map extraction;
whereas the 3D object detection neural network is an extension of a 2D object detector by extending 2D anchors (cx, cy, width, height) target bounding boxes into 3D anchors (cx, cy, cz, width, height, depth) target cuboids.
2. The intelligent vision-based detection and ranging system of claim 1, further comprising:
a second stereo vision system with a field of view A2 and a baseline B2,
wherein the A1 is greater than the A2 and the B2 is greater than the B1,
wherein the A1 may have an overlap area with the A2.
3. The intelligent vision-based detection and ranging system of claim 2, further comprising a monocular vision system with a field of view A3.
4. The intelligent vision-based detection and ranging system of claim 3, wherein the A3 may have an overlap area with at least one of the A1 and the A2.
5. The intelligent vision-based detection and ranging system of claim 2, wherein the B2 is at least 2× or greater than the B1 and the A1 is at least 3× or greater than the A2.
6. A method of estimating an object distance using an intelligent vision-based detection and ranging system, comprising:
feeding a left frame image and a right frame image into a 3D object detection neural network from a stereo vision system;
detecting a same object from the left frame image and the right frame image using the 3D object detection neural network;
computing a cuboid of the same object using the 3D object detection neural network;
whereas the 3D object detection neural network is an extension of a 2D object detector by adding an additional RGB channel for deep convolutional feature map extraction;
whereas the 3D object detection neural network is an extension of a 2D object detector by extending 2D anchors (cx, cy, width, height) target bounding boxes into 3D anchors (cx, cy, cz, width, height, depth) target cuboids.
7. A method of estimating an object distance using an intelligent vision-based detection and ranging system, comprising:
feeding a left frame image and a right frame image into a 3D object detection neural network from at least one of a first stereo vision system and a second stereo vision system;
detecting a same object from the left frame image and the right frame image using the 3D object detection neural network;
computing a cuboid of the same object using the 3D object detection neural network;
whereas the 3D object detection neural network is an extension of a 2D Single Shot MultiBox Detector by adding an additional RGB channel for deep convolutional feature map extraction;
whereas the 3D object detection neural network is an extension of a 2D Single Shot MultiBox Detector by extending 2D anchors (cx, cy, width, height) target bounding boxes into 3D anchors (cx, cy, cz, width, height, depth) target cuboids.
US16/864,914 2020-05-01 2020-05-01 Intelligent vision-based detection and ranging system and method Active US11004233B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/864,914 US11004233B1 (en) 2020-05-01 2020-05-01 Intelligent vision-based detection and ranging system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/864,914 US11004233B1 (en) 2020-05-01 2020-05-01 Intelligent vision-based detection and ranging system and method

Publications (1)

Publication Number Publication Date
US11004233B1 true US11004233B1 (en) 2021-05-11

Family

ID=75845817

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/864,914 Active US11004233B1 (en) 2020-05-01 2020-05-01 Intelligent vision-based detection and ranging system and method

Country Status (1)

Country Link
US (1) US11004233B1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210378755A1 (en) * 2020-06-09 2021-12-09 Globus Medical, Inc. Camera tracking bar for computer assisted navigation during surgery
CN114998849A (en) * 2022-05-27 2022-09-02 电子科技大学 Traffic flow element sensing and positioning method based on road end monocular camera and application thereof
CN115082811A (en) * 2022-07-27 2022-09-20 大连海事大学 Method for identifying and measuring distance of marine navigation ship according to image data
EP4131174A1 (en) * 2021-08-05 2023-02-08 Argo AI, LLC Systems and methods for image based perception
US11645775B1 (en) * 2022-06-23 2023-05-09 Plusai, Inc. Methods and apparatus for depth estimation on a non-flat road with stereo-assisted monocular camera in a vehicle
US11663807B2 (en) 2021-08-05 2023-05-30 Ford Global Technologies, Llc Systems and methods for image based perception
CN117576665A (en) * 2024-01-19 2024-02-20 南京邮电大学 Automatic driving-oriented single-camera three-dimensional target detection method and system
US11966452B2 (en) 2021-08-05 2024-04-23 Ford Global Technologies, Llc Systems and methods for image based perception

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140104266A1 (en) * 2012-10-11 2014-04-17 Adobe Systems Incorporated 3d transformation of objects using 2d controls projected in 3d space and contextual face selections of a three dimensional bounding box
US20170193338A1 (en) * 2016-01-05 2017-07-06 Mobileye Vision Technologies Ltd. Systems and methods for estimating future paths
US20170206430A1 (en) * 2016-01-19 2017-07-20 Pablo Abad Method and system for object detection
US20180276841A1 (en) * 2017-03-23 2018-09-27 Intel Corporation Method and system of determining object positions for image processing using wireless network angle of transmission
US20190001868A1 (en) * 2017-06-30 2019-01-03 Mazda Motor Corporation Vehicle headlight and light distribution control device of vehicle headlight
US20190102677A1 (en) * 2017-10-03 2019-04-04 StradVision, Inc. Method for acquiring a pseudo-3d box from a 2d bounding box by regression analysis and learning device and testing device using the same
US20190349571A1 (en) * 2018-05-11 2019-11-14 Ford Global Technologies, Llc Distortion correction for vehicle surround view camera projections
US20200105065A1 (en) * 2018-09-27 2020-04-02 Eloupes, Inc. (dba Proprio Vision) Camera array for a mediated-reality system
US20200213498A1 (en) * 2017-09-11 2020-07-02 Signify Holding B.V. Detecting coded light with rolling-shutter cameras

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140104266A1 (en) * 2012-10-11 2014-04-17 Adobe Systems Incorporated 3d transformation of objects using 2d controls projected in 3d space and contextual face selections of a three dimensional bounding box
US20170193338A1 (en) * 2016-01-05 2017-07-06 Mobileye Vision Technologies Ltd. Systems and methods for estimating future paths
US20170206430A1 (en) * 2016-01-19 2017-07-20 Pablo Abad Method and system for object detection
US20180276841A1 (en) * 2017-03-23 2018-09-27 Intel Corporation Method and system of determining object positions for image processing using wireless network angle of transmission
US20190001868A1 (en) * 2017-06-30 2019-01-03 Mazda Motor Corporation Vehicle headlight and light distribution control device of vehicle headlight
US20200213498A1 (en) * 2017-09-11 2020-07-02 Signify Holding B.V. Detecting coded light with rolling-shutter cameras
US20190102677A1 (en) * 2017-10-03 2019-04-04 StradVision, Inc. Method for acquiring a pseudo-3d box from a 2d bounding box by regression analysis and learning device and testing device using the same
US20190349571A1 (en) * 2018-05-11 2019-11-14 Ford Global Technologies, Llc Distortion correction for vehicle surround view camera projections
US20200105065A1 (en) * 2018-09-27 2020-04-02 Eloupes, Inc. (dba Proprio Vision) Camera array for a mediated-reality system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fidler et al. "3d Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model", Advances in Neural Information Processing Systems (Year: 2012). *
Xiang et al. "Estimating the aspect layout of object categories", Computer Vision and Pattern Recognition [CVPR), 2012 IEEE Conference (Year: 2012). *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210378755A1 (en) * 2020-06-09 2021-12-09 Globus Medical, Inc. Camera tracking bar for computer assisted navigation during surgery
US11317973B2 (en) * 2020-06-09 2022-05-03 Globus Medical, Inc. Camera tracking bar for computer assisted navigation during surgery
US20220211447A1 (en) * 2020-06-09 2022-07-07 Globus Medical, Inc. Camera tracking bar for computer assisted navigation during surgery
EP4131174A1 (en) * 2021-08-05 2023-02-08 Argo AI, LLC Systems and methods for image based perception
US11663807B2 (en) 2021-08-05 2023-05-30 Ford Global Technologies, Llc Systems and methods for image based perception
US11966452B2 (en) 2021-08-05 2024-04-23 Ford Global Technologies, Llc Systems and methods for image based perception
CN114998849A (en) * 2022-05-27 2022-09-02 电子科技大学 Traffic flow element sensing and positioning method based on road end monocular camera and application thereof
CN114998849B (en) * 2022-05-27 2024-04-16 电子科技大学 Traffic flow element sensing and positioning method based on road-side monocular camera and application thereof
US11645775B1 (en) * 2022-06-23 2023-05-09 Plusai, Inc. Methods and apparatus for depth estimation on a non-flat road with stereo-assisted monocular camera in a vehicle
CN115082811A (en) * 2022-07-27 2022-09-20 大连海事大学 Method for identifying and measuring distance of marine navigation ship according to image data
CN117576665A (en) * 2024-01-19 2024-02-20 南京邮电大学 Automatic driving-oriented single-camera three-dimensional target detection method and system
CN117576665B (en) * 2024-01-19 2024-04-16 南京邮电大学 Automatic driving-oriented single-camera three-dimensional target detection method and system

Similar Documents

Publication Publication Date Title
US11004233B1 (en) Intelligent vision-based detection and ranging system and method
AU2018278901B2 (en) Systems and methods for updating a high-resolution map based on binocular images
JP7073315B2 (en) Vehicles, vehicle positioning systems, and vehicle positioning methods
CA3028653C (en) Methods and systems for color point cloud generation
US20190205668A1 (en) Object detecting apparatus, object detecting method, and computer program product
US10909395B2 (en) Object detection apparatus
JP2012185011A (en) Mobile position measuring apparatus
WO2020154990A1 (en) Target object motion state detection method and device, and storage medium
JP7190261B2 (en) position estimator
EP3803790B1 (en) Motion segmentation in video from non-stationary cameras
US20210174113A1 (en) Method for limiting object detection area in a mobile system equipped with a rotation sensor or a position sensor with an image sensor, and apparatus for performing the same
TWI754808B (en) Vehicle, vehicle positioning system, and vehicle positioning method
US11676403B2 (en) Combining visible light camera and thermal camera information
JP2020187474A (en) Traveling lane recognition device, traveling lane recognition method and program
KR20190134303A (en) Apparatus and method for image recognition
JP7315216B2 (en) Corrected Distance Calculation Device, Corrected Distance Calculation Program, and Corrected Distance Calculation Method
Leu et al. High speed stereo vision based automotive collision warning system
US20230316539A1 (en) Feature detection device, feature detection method, and computer program for detecting feature
KR102003387B1 (en) Method for detecting and locating traffic participants using bird's-eye view image, computer-readerble recording medium storing traffic participants detecting and locating program
US20220245831A1 (en) Speed estimation systems and methods without camera calibration
EP4148375A1 (en) Ranging method and apparatus
JP6718025B2 (en) Device and method for identifying a small object area around a vehicle
CN216783393U (en) Visual system of intelligent vehicle
AU2018102199A4 (en) Methods and systems for color point cloud generation
JP7348874B2 (en) Tilt angle detection device and control device

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE