CN112602091A - Object detection using multiple neural networks trained for different image fields - Google Patents

Object detection using multiple neural networks trained for different image fields Download PDF

Info

Publication number
CN112602091A
CN112602091A CN201980055920.4A CN201980055920A CN112602091A CN 112602091 A CN112602091 A CN 112602091A CN 201980055920 A CN201980055920 A CN 201980055920A CN 112602091 A CN112602091 A CN 112602091A
Authority
CN
China
Prior art keywords
field image
far
image segment
segment
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980055920.4A
Other languages
Chinese (zh)
Inventor
S·D·安丘
王北楠
J·格洛斯纳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Optimum Semiconductor Technologies Inc
Original Assignee
Optimum Semiconductor Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Optimum Semiconductor Technologies Inc filed Critical Optimum Semiconductor Technologies Inc
Publication of CN112602091A publication Critical patent/CN112602091A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/403Image sensing, e.g. optical camera
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/408Radar; Laser, e.g. lidar
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

A system and method relating to object detection may include: the method includes receiving an image frame comprising an array of pixels captured by an image sensor associated with a processing device, identifying a near-field image segment and a far-field image segment in the image frame, applying a first neural network trained for the near-field image segment to detect objects present in the near-field image segment, and applying a second neural network trained for the far-field image segment to detect objects present in the near-field image segment.

Description

Object detection using multiple neural networks trained for different image fields
Cross Reference to Related Applications
This application claims priority to U.S. provisional application 62/711,695 filed on 30.7.2018, the contents of which are incorporated herein by reference in their entirety.
Technical Field
The present disclosure relates to detecting objects in images, and more particularly, to systems and methods for object detection using multiple neural networks trained for different image fields.
Background
Computer systems programmed to detect objects in an environment have a wide range of industrial applications. For example, autonomous vehicles may be equipped with sensors (e.g., lidar sensors and cameras) to capture sensor data around the vehicle. Further, the autonomous vehicle may be equipped with a computer system that includes a processing device that executes executable code to detect objects around the vehicle based on the sensor data.
Neural networks are used for object detection. The neural network in the present disclosure is an artificial neural network that may be implemented using circuitry to make decisions based on input data. A neural network may include one or more layers of nodes, where each node may be implemented in hardware as a computational circuit element for performing computations. Nodes in the input layer may receive input data into the neural network. A node in an inner layer may receive output data generated by a node in a previous layer. Further, nodes in the layer may perform certain calculations and generate output data for nodes of subsequent layers. The nodes of the output layer may generate output data for the neural network. Thus, a neural network may contain multiple layers of nodes to perform the computation of the forward propagation from the input layer to the output layer.
Drawings
The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the disclosure. The drawings, however, should not be taken to limit the disclosure to the specific embodiments, but are for explanation and understanding only.
FIG. 1 illustrates a system for detecting an object using multiple compact neural networks matched to different image fields, according to an embodiment of the present disclosure.
Fig. 2 illustrates an decomposition of an image frame according to an embodiment of the present disclosure.
Fig. 3 illustrates a decomposition of an image frame into near field image segments and far field image segments according to an embodiment of the present disclosure.
FIG. 4 depicts a flow diagram of a method of using a multi-field object detector according to an embodiment of the present disclosure.
Fig. 5 depicts a block diagram of a computer system operating in accordance with one or more aspects of the present disclosure.
Detailed Description
The neural network may include a plurality of layers of nodes. The layers may include an input layer, an output layer, and a hidden layer therebetween. The computation of the neural network propagates from the input layer through the hidden layer to the output layer. Each layer may include a node associated with a node value calculated from a previous layer through an edge connecting nodes between the current layer and the previous layer. An edge may connect a node in one layer to a node in an adjacent layer. Each edge may be associated with a weight value. Accordingly, the node value associated with the node of the current layer may be a weighted sum of the node values of the previous layers.
One type of neural network is a Convolutional Neural Network (CNN), where the computation performed at a hidden layer may be a convolution of node values associated with previous layers and weight values associated with edges. For example, the processing means may apply a convolution operation to the input layer and generate a node value of a first hidden layer connected to the input layer by an edge, and apply a convolution operation to the first hidden layer to generate a node value of a second hidden layer, and so on until the calculation reaches the output layer. The processing device may apply a soft combining operation to the output data and generate a detection result. The detection result may include the identity of the detected object and its location.
The topology and the weight values associated with the edges are determined in a neural network training phase. During the training phase, training input data may be fed into the CNN in forward propagation (from the input layer to the output layer). The output result of CNN may be compared with the target output data to calculate error data. Based on the error data, the processing device may perform back propagation in which weight values associated with edges are adjusted according to discriminant analysis. The process of forward and backward propagation may be repeated until the error data meets certain performance requirements during the verification process. CNN can then be used for object detection. CNNs may be trained for a particular class of objects (e.g., human) or multiple classes of objects (e.g., cars, pedestrians, and trees).
Autonomous vehicles are often equipped with a computer system for object detection. Instead of relying on a human operator to detect objects in the surrounding environment, an in-vehicle computer system may be programmed to use sensors to capture information of the environment and detect objects based on sensor data. Sensors used by autonomous vehicles may include cameras, lidar, radar, and the like.
In some embodiments, one or more cameras are used to capture images of the surrounding environment. The camera may include an optical lens, an array of photosensitive elements, a digital image processing unit, and a storage device. The optical lens may receive the light beam and focus the light beam on an image plane. Each optical lens may be associated with a focal length, which is the distance between the lens and the image plane. In practice, the video camera may have a fixed focal length, where the focal length may determine the field of view (FOV). The field of view of an optical device (e.g., a video camera) refers to the viewable area through the optical device. A shorter focal length may be associated with a wider field of view; a longer focal length may be associated with a narrower field of view.
The array of photosensitive elements may be fabricated in a silicon plane located at a position along the optical axis of the lens to capture the light beam passing through the lens. The image sensing elements may be Charge Coupled Device (CCD) elements, Complementary Metal Oxide Semiconductor (CMOS) elements or any suitable type of light sensitive devices. Each light sensitive element may capture a different color component (red, green, blue) of the light impinging on the light sensitive element. The array of photosensitive elements may comprise a rectangular array of a predetermined number of elements (e.g., M by N, where M and N are integers). The total number of elements in the array may determine the resolution of the camera.
The digital image processing unit is a hardware processor that can be coupled to an array of photosensitive elements to capture the response of these photosensitive elements to light. The digital image processing unit may include an analog-to-digital converter (ADC) to convert the analog signal from the photosensitive element into a digital signal. The digital image processing unit may also perform a filtering operation on the digital signal and encode the digital signal according to a video compression standard.
In one embodiment, the digital image processing unit may be coupled to a timing generator and record images captured by the photosensitive elements at predetermined time intervals (e.g., 30 or 60 frames per second). Each recorded image is referred to as an image frame comprising a rectangular array of pixels. Thus, image frames captured by a fixed focus video camera at a fixed spatial resolution may be stored in a storage device for further processing, such as object detection, where the resolution is defined by the number of pixels in a unit area in the image frame.
One technical challenge of autonomous vehicles is detecting human bodies based on images captured by one or more video cameras. The neural network may be trained to recognize human bodies in the image. The trained neural network may be deployed in actual operation to detect a human body. If the focal length is much shorter than the distance between the human body and the lens of the video camera, the optical magnification of the video camera can be expressed as G ═ f/p ═ i/o, where p is the distance from the object to the center of the lens, f is the focal length, i (measured in number of pixels) is the length of the object projected on the image frame, and o is the height of the object. As the distance p increases, the number of pixels associated with the object decreases. As a result, fewer pixels are used to capture the height of the human body at a distance. Since fewer pixels may provide less information about the human body, a trained neural network may have difficulty detecting a distant human body. For example, assume that the focal length f is 0.1m (meters); the height o of the object is 2 m; the pixel density k is 100 pixels/mm; minimum pixel for object detectionThe number Nmin is 80 pixels. The maximum distance for reliable object detection is p ═ f o/(N/k) ═ 0.1 × 2/80 × 10-3250 m/l 00. Therefore, a depth of field exceeding 250m is defined as a far field. If i is 40 pixels, p is 500 m. If the far field is in the range of 250-500m, the resolution for representing the object needs to be doubled from 40 pixels to 80 pixels.
To overcome the above-described and other drawbacks of object detection using neural networks, embodiments of the present disclosure provide systems and methods that may divide a two-dimensional region of an image frame into image segments. Each image segment may be associated with a particular image field including at least one of a far field or a near field. The image segments associated with the far field may have a higher resolution than the image segments associated with the near field. Thus, an image segment associated with the far field may include more pixels than an image segment associated with the near field. Embodiments of the present disclosure may further provide for each image segment a neural network trained specifically for that image segment, wherein the number of neural networks is the same as the number of image segments. Because each image segment is much smaller than the entire image frame, the neural network associated with the image segment is more compact and can provide more accurate detection results.
Embodiments of the present disclosure may also track detected human bodies through different segments associated with different fields (e.g., from far-field to near-field) to further reduce false alarm rates. When a human body moves within range of the lidar sensor, the lidar sensor and the video camera may be paired together to detect the human body.
FIG. 1 illustrates a system 100 for detecting an object using multiple compact neural networks matching different image fields according to an embodiment of the present disclosure. As shown in fig. 1, the system 100 may include a processing device 102, an accelerator circuit 104, and a memory device 106. System 100 may optionally include sensors, such as lidar sensor 122 and video camera 120. The system 100 may be a computing system (e.g., on an autonomous vehicle) or a system on a chip (SoC). The processing device 102 may be a hardware processor, such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a general purpose processing unit. In one embodiment, the processing device 102 may be programmed to perform certain tasks, including delegating compute-intensive tasks to the accelerator circuit 104.
The accelerator circuit 104 may be communicatively coupled to the processing device 102 to perform computationally intensive tasks using dedicated circuitry therein. The special purpose circuits may be Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), network processors, and the like. In one embodiment, the accelerator circuit 104 may include a plurality of Compute Circuit Elements (CCEs), which are circuit elements that may be programmed to perform some type of computation. For example, to implement a neural network, CCEs may be programmed under the instructions of processing device 102 to perform operations such as weighted summation and convolution. Thus, each CCE may be programmed to perform computations associated with nodes of the neural network; a set of CCEs of the accelerator circuit 104 may be programmed as a layer (visible layer or hidden layer) of nodes in the neural network; the sets of CCEs of the accelerator circuit 104 may be programmed into layers that serve as nodes of a neural network. In one embodiment, in addition to performing the calculations, the CCE may also include local storage (e.g., registers) (not shown) to store parameters (e.g., synaptic weights) used in the calculations. Thus, for simplicity and simplicity of description, each CCE in the present disclosure corresponds to a circuit element that enables calculation of parameters associated with a node of the neural network. The processing device 102 may be programmed with instructions to construct an architecture for the neural network and train the neural network for a particular task.
The memory device 106 may include a storage device communicatively coupled to the processing device 102 and the accelerator circuit 104. In one implementation, the memory device 106 may store input data 116 to the multi-field object detector 108 for execution by the processing device 102 and output data 118 generated by the multi-field object detector 108. The input data 116 may be sensor data captured by sensors such as a lidar sensor 120 and a video camera 122. The output data may be the result of object detection by the multi-field object detector 108. The object detection result may be recognition of a human body.
In one embodiment, the processing device 102 may be programmed to execute a multi-field object detector 108, which when executed, the multi-field object detector 108 may detect a human body based on the input data 116. Instead of utilizing a neural network for detecting objects based on full resolution image frames captured by the video camera 122, implementations of the multi-field object detector 108 may employ a combination of several reduced complexity neural networks to achieve object detection. In one embodiment, the multi-field object detector 108 may decompose a video image captured by the video camera 122 into near-field image segments and far-field image segments, where the far-field image segments may have a higher resolution than the near-field image segments. The size of either the far field image segment or the near field image segment is smaller than the size of the full resolution image. The multi-field object detector 108 may apply a Convolutional Neural Network (CNN)110 trained specifically for near-field image segments to near-field image segments and CNN 112 trained specifically for far-field image segments to far-field image segments. Multi-field object detector 108 may also track over time the arrival of a human body detected in the far field to the near field until the human body reaches the range of lidar sensor 120. Multi-field object detector 108 may then apply CNN 114 trained specifically for lidar data to the lidar data. Because the CNNs 110, 112 are trained for the near and far field image segments, respectively, the CNNs 110, 112 may be compact CNNs that are smaller than CNNs trained for full resolution images.
The multi-field object detector 108 can decompose the full resolution image into a near-field image representation (referred to as "near-field image segments") that captures objects closer to the optical lens and a far-field image representation (referred to as "far-field image segments") that captures objects further from the optical lens. Fig. 2 illustrates an decomposition of an image frame according to an embodiment of the present disclosure. As shown in fig. 2, the optical system of the video camera 200 may include a lens 202 and an image plane (e.g., an array of photosensitive elements) 204 at a distance from the lens 202, wherein the image plane is within the depth of field of the video camera. Depth of field is the distance between the image plane and the focal plane at which an object captured at the image plane appears in acceptable sharpness in the image. Objects that are far from the lens 202 can be projected onto a small area on the image plane and therefore require a higher resolution (or sharper focus, more pixels) to be identified. In contrast, objects near the lens 202 can be projected onto a large area on the image plane and therefore require a lower resolution (fewer pixels) to be identified. As shown in fig. 2, the near field image segments cover a larger area in the image plane than the far field image segments. In some cases, the near field image segment may overlap a portion of the far field image in the image plane.
Fig. 3 illustrates the decomposition of an image frame 300 into near field image segments 302 and far field image segments 304 according to an embodiment of the present disclosure. Although the above embodiments are discussed by way of example for near field image segments and far field image segments, embodiments of the present disclosure may also include multi-field image segments, where each of the image segments is associated with a specially trained neural network. For example, the image segments may include a near field image segment, a mid field image segment, and a far field image segment. The processing means may apply different neural networks to the near field image segment, the mid field image segment and the far field image segment for human detection.
The video camera may record a stream of image frames that includes a pixel array corresponding to the photosensitive elements on the image plane 204. Each image frame may include a plurality of rows of pixels. Thus, as shown in FIG. 2, the area of image frame 300 is proportional to the area of image plane 204. As shown in fig. 3, the near field image segment 302 may cover a larger portion of the image frame than the far field image segment 304 because objects near the optical lens are projected on the image plane larger. In one embodiment, near field image segments 304 and far field image segments 306 may be extracted from an image frame, where the near field image segments 302 are associated with a lower resolution (e.g., a sparse sampling pattern 306) and the far field image segments 304 are associated with a higher resolution (e.g., a dense sampling pattern 308).
In one embodiment, the processing device 102 may execute an image pre-processor to extract the near field image segments 306 and the far field image segments 308. The processing device 102 may first identify the top band 310 and the bottom band 312 of the image frame 300 and discard the top band 310 and the bottom band 312. Processing device 102 may identify top band 310 as a first predetermined number of pixel rows and bottom band 312 as a second predetermined number of pixel rows. Processing device 102 may discard top band 310 and bottom band 312 because both bands cover the sky and the road directly in front of the camera, and both bands typically do not contain a human body.
The processing device 102 may further identify a first range of pixel rows for the near field image segment 302 and a second range of pixel rows for the far field image segment 304, where the first range may be greater than the second range. The first range of pixel rows may include a third predetermined number of pixel rows in the middle of the image frame; the second range of rows of pixels may include a fourth predetermined number of rows of pixels vertically above the centerline of the image frame. Processing device 102 may also decimate pixels within a first range of rows of pixels using sparse sub-sampling pattern 306 and decimate pixels within a second range of rows of pixels using dense sub-sampling pattern 308. In one embodiment, the near field image segments 302 are decimated using a large decimation factor (e.g., 8) and the far field image segments 304 are decimated using a small decimation factor (e.g., 2), such that the resolution of the extracted far field image segments 304 is higher than the extracted near field image segments 306. In one embodiment, the resolution of the far field image segment 304 may be twice the resolution of the near field image segment 306. In another embodiment, the resolution of the far field image segments 304 may be greater than twice the resolution of the near field image segments 306.
The video camera may capture a stream of image frames at a certain frame rate (e.g., 30 or 60 frames per second). The processing device 102 may execute an image pre-processor to extract corresponding near field image segments 302 and far field image segments 304 for each image frame in the stream. In one embodiment, a first neural network is trained for human detection based on near-field image segment data and a second neural network is trained for human detection based on far-field image segment data. The number of nodes in the first and second neural networks is small compared to a neural network trained for a full resolution of the image frame.
FIG. 4 depicts a flow diagram of a method 400 of using a multi-field object detector in accordance with an embodiment of the present disclosure. The method 400 may be performed by a processing device that may comprise hardware (e.g., circuitry, dedicated logic), computer readable instructions (e.g., running on a general purpose computer system or a dedicated machine), or a combination of both. The method 400 and its individual functions, routines, subroutines, or operations may each be performed by one or more processors of a computer device executing the method. In some embodiments, method 400 may be performed by a single processing thread. Alternatively, the method 400 may be performed by two or more processing threads, each thread performing one or more separate functions, routines, subroutines, or operations of the method.
For simplicity of explanation, the methodologies of the present disclosure are depicted and described as a series of acts. However, acts may occur in various orders and/or concurrently, and with other acts not presented and described herein, in accordance with the disclosure. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the methodologies could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methodologies disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methodologies to computing devices. The term "article of manufacture" as used herein is intended to encompass a computer program accessible from any computer-readable device or storage media. In one embodiment, the method 400 may be performed by the processing device 102 executing the multi-field object detector 108 and the accelerator circuit 104 supporting CNN as shown in fig. 1.
Compact neural networks for human detection may require training before being deployed on an autonomous vehicle. During the training process, the weight parameters associated with the edges of the neural network may be adjusted and selected based on certain criteria. The training of the neural network may be done offline using a publicly available database. These publicly available databases may include images of outdoor scenes that include human bodies that have been manually marked. In one embodiment, the images of the training data may be further processed to identify human bodies in the far field and the near field. For example, the far field image may be a 50x80 pixel window cropped from the image. Thus, the training data may include far-field training data and near-field training data. Training can be performed offline by a more powerful computer (referred to as a "training computer system").
The processing device of the training computer system may train a first neural network based on the near-field training data and may train a second neural network based on the far-field training data. The type of neural network may be a Convolutional Neural Network (CNN), and the training may be based on back propagation. The trained first and second neural networks are smaller than the image frame based full resolution trained neural network. After training, the first and second neural networks may be used by the autonomous vehicle to detect objects (e.g., human bodies) on the road.
Referring to fig. 4, at 402, the processing device 102 (or a different processing device on the autonomous vehicle) may identify a stream of image frames captured by a video camera during operation of the autonomous vehicle. The processing means will detect a human in the stream.
At 404, the processing device 102 may extract near-field image segments and far-field image segments from the image frames of the stream using the method described above in connection with fig. 3. The resolution of the near field image segments may be lower than the resolution of the far field image segments.
At 406, the processing device 102 may apply a first neural network trained based on near-field training data to the near-field image segment to identify a human body in the near-field image segment.
At 408, the processing device 102 may apply a second neural network trained based on far-field training data to the far-field image segment to identify the human body in the far-field image segment.
At 410, in response to detecting a human body in the far-field image segment, the processing device 102 may record the detected human body in a record and track the human body through image frames from the far-field to the near-field. The processing device 102 may use a polynomial fit and/or a kalman predictor to predict a location of the human body detected in a subsequent image frame and apply a second neural network to far field image segments extracted from the subsequent image frame to determine whether the human body is located at the predicted location. If the processing means determines that no human body is present at the predicted position, the detected human body is considered to be a false positive and the entry corresponding to the human body is deleted from the record.
At 412, the processing device 102 may further determine whether the approaching human is within range of a lidar sensor paired with a video camera on the autonomous vehicle for human detection. The lidar may detect objects in a range shorter than the far field but within the near field. In response to determining that the human body is within range of the lidar sensor (e.g., by detecting an object in a corresponding location with a far-field image segment), the processing device may apply a third neural network trained on the lidar sensor data to the lidar sensor data and apply a second neural network for the far-field image segment (or apply the first neural network for the near-field image segment). In this way, lidar sensor data may be used in conjunction with image data to further improve human detection.
The processing device 102 may also operate the autonomous vehicle based on the detection of the human body. For example, the processing device 102 may operate the vehicle to stop or avoid a collision with a human body.
Fig. 5 depicts a block diagram of a computer system operating in accordance with one or more aspects of the present disclosure. In various illustrative examples, computer system 500 may correspond to system 100 of fig. 1.
In some embodiments, computer system 500 may be connected (e.g., via a network such as a Local Area Network (LAN), intranet, extranet, or the internet) to other computer systems. The computer system 500 may operate in a client-server environment as a server or client computer, or in a peer-to-peer or distributed network environment as a peer computer. Computer system 500 may be provided by a Personal Computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a Web appliance, a server, a network router, switch or bridge, or any device capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that device. Furthermore, the term "computer" shall include any collection of computers that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
In another aspect, computer system 500 may include a processing device 502, a volatile memory 504 (e.g., Random Access Memory (RAM)), a non-volatile memory 506 (e.g., read-only memory (ROM) or Electrically Erasable Programmable ROM (EEPROM)), and a data storage device 516, which may communicate with each other via a bus 508.
The processing device 502 may be provided by one or more processors, such as a general-purpose processor (e.g., a Complex Instruction Set Computing (CISC) microprocessor, Reduced Instruction Set Computing (RISC) microprocessor, Very Long Instruction Word (VLIW) microprocessor, microprocessor implementing other types of instruction sets, or microprocessors implementing combinations of various types of instruction sets) or a special-purpose processor (e.g., an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or a network processor).
The computer system 500 may also include a network interface device 522. The computer system 500 may also include a video display unit 510 (e.g., an LCD), an alphanumeric input device 512 (e.g., a keyboard), a cursor control device 514 (e.g., a mouse), and a signal generation device 520.
Data storage 516 may include a non-transitory computer-readable storage medium 524 on which may be stored instructions 526 encoding any one or more of the methods or functions described herein, including instructions of multi-field object detector 108 of fig. 1, for implementing method 400.
The instructions 526 may also reside, completely or partially, within the volatile memory 504 and/or within the processing device 502 during execution thereof by the computer system 500, such that the volatile memory 504 and the processing device 502 may also constitute machine-readable storage media.
While the computer-readable storage medium 524 is shown in an illustrative example to be a single medium, the term "computer-readable storage medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of executable instructions. The term "computer-readable storage medium" shall also be taken to include a tangible medium that is capable of storing or encoding a set of instructions for execution by the computer to cause the computer to perform any one or more of the methodologies described herein. The term "computer readable storage medium" shall include, but not be limited to, solid-state memories, optical media, and magnetic media.
The methods, components and features described herein may be implemented by discrete hardware components or may be integrated in the functionality of other hardware components such as ASICS, FPGAs, DSPs or similar devices. Additionally, the methods, components and features may be implemented by firmware modules or functional circuits within a hardware device. Furthermore, the methods, components and features may be implemented in any combination of hardware devices and computer program components or in a computer program.
Unless specifically stated otherwise, terms such as "receiving," "associating," "determining," "updating," or the like, refer to the action and processes performed or effected by a computer system that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices. Also, as used herein, the terms "first," "second," "third," "fourth," etc. refer to labels used to distinguish between different elements, and may not have an ordinal meaning according to their numerical designation.
Examples described herein also relate to an apparatus for performing the methods described herein. The apparatus may be specially constructed for carrying out the methods described herein, or it may comprise a general purpose computer system selectively programmed by a computer program stored in the computer system. Such a computer program may be stored in a tangible storage medium readable by a computer.
The methods and illustrative examples described herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with the teachings described herein, or it may prove convenient to construct a more specialized apparatus to perform the method 300 and/or each of its various functions, routines, subroutines, or operations. In the above description, structural examples of various of these systems are set forth.
The above description is intended to be illustrative, and not restrictive. While the present disclosure has been described with reference to specific illustrative examples and embodiments, it will be recognized that the present disclosure is not limited to the described examples and embodiments. The scope of the disclosure should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims (20)

1. A method for detecting an object using a plurality of sensor devices, comprising:
receiving, by a processing device, an image frame comprising an array of pixels captured by an image sensor associated with the processing device;
identifying, by the processing device, near field image segments and far field image segments in the image frame;
applying, by the processing device, a first neural network trained for a near-field image segment to the near-field image segment to detect an object present in the near-field image segment; and
applying, by the processing device, a second neural network trained for a far-field image segment to the far-field image segment to detect objects present in the far-field image segment.
2. The method of claim 1, wherein each of the near field image segment or the far field image segment includes fewer pixels than the image frame.
3. The method according to claim 1 or 2, wherein the near field image segment comprises a first number of pixel lines and the far field image comprises a second number of pixel lines, and wherein the first number of pixel lines is smaller than the second number of pixel lines.
4. A method according to claim 1 or 2, wherein the number of pixels of the near field image segment is less than the number of pixels of the far field image segment.
5. A method according to claim 1 or 2, wherein the resolution of the near field image segments is lower than the resolution of the far field image segments.
6. The method of claim 1 or 2, wherein the near field image segments capture a scene at a first distance from an image plane of the image sensor and the far field image segments capture a scene at a second distance from the image plane, and wherein the first distance is less than the second distance.
7. The method of claim 1, further comprising:
operating an autonomous vehicle based on detection of a first object or a second object in the far field image segment in response to at least one of identifying the first object in the near field image or identifying the second object in the far field image segment.
8. The method of claim 1, further comprising:
in response to detecting a second object in the far field image segment, tracking the second object over time through a plurality of image frames from a range associated with the far field image segment to a range associated with one of the near field image segment or the far field image segment;
determining a range of the second object in a second image frame to reach a lidar sensor based on tracking the second object over time;
receiving lidar sensor data captured by the lidar sensor; and
applying a trained third neural network to the lidar sensor data to detect an object.
9. The method of claim 8, further comprising:
applying the first neural network to the near-field image segments of the second image frame or applying the second neural network to the far-field image segments of the second image frame; and
validating an object detected by at least one of applying the first neural network or applying the second neural network using an object detected by applying the third neural network.
10. A system for detecting an object using a plurality of sensor devices, comprising:
an image sensor;
a storage device for storing instructions; and
processing means, communicatively coupled to the image sensor and the storage means, for executing instructions to:
receiving an image frame comprising an array of pixels captured by an image sensor associated with the processing device;
identifying near field image segments and far field image segments in the image frame;
applying a first neural network trained on a near-field image segment to the near-field image segment to detect objects present in the near-field image segment; and
applying a second neural network trained on a far-field image segment to the far-field image segment to detect objects present in the far-field image segment.
11. The system of claim 10, wherein each of the near field image segment or the far field image segment includes fewer pixels than the image frame.
12. The system of claim 10 or 11, wherein the near field image segment comprises a first number of pixel rows and the far field image comprises a second number of pixel rows, and wherein the first number of pixel rows is less than the second number of pixel rows.
13. The system according to claim 10 or 11, wherein the number of pixels of the near field image segment is less than the number of pixels of the far field image segment.
14. The system according to claim 10 or 11, wherein the resolution of the near field image segments is lower than the resolution of the far field image segments.
15. The system of claim 10 or 11, wherein the near field image segments capture a scene at a first distance from an image plane of the image sensor and the far field image segments capture a scene at a second distance from the image plane, and wherein the first distance is less than the second distance.
16. The system of claim 10, wherein the processing device is to:
operating an autonomous vehicle based on detection of a first object or a second object in the far field image segment in response to at least one of identifying the first object in the near field image or identifying the second object in the far field image segment.
17. The system of claim 10, further comprising a lidar sensor, wherein the processing device is to:
in response to detecting a second object in the far field image segment, tracking the second object over time through a plurality of image frames from a range associated with the far field image segment to a range associated with one of the near field image segment or the far field image segment;
determining a range of the second object in a second image frame to reach a lidar sensor based on tracking the second object over time;
receiving lidar sensor data captured by the lidar sensor; and
applying a trained third neural network to the lidar sensor data to detect an object.
18. The system of claim 17, wherein the processing device is to:
applying the first neural network to the near-field image segments of the second image frame or applying the second neural network to the far-field image segments of the second image frame; and
validating an object detected by at least one of applying the first neural network or applying the second neural network using an object detected by applying the third neural network.
19. A non-transitory machine-readable storage medium storing instructions that, when executed, cause a processing device to perform operations for detecting an object using a plurality of sensor devices, the operations comprising:
receiving, by the processing device, an image frame comprising an array of pixels captured by an image sensor associated with the processing device;
identifying, by the processing device, near field image segments and far field image segments in the image frame;
applying, by the processing device, a first neural network trained for a near-field image segment to the near-field image segment to detect an object present in the near-field image segment; and
applying, by the processing device, a second neural network trained for a far-field image segment to the far-field image segment to detect objects present in the near-field image segment.
20. The non-transitory machine-readable storage medium of claim 19, wherein the near field image segment includes a first number of rows of pixels and the far field image includes a second number of rows of pixels, and wherein the first number of rows of pixels is less than the second number of rows of pixels.
CN201980055920.4A 2018-07-30 2019-07-24 Object detection using multiple neural networks trained for different image fields Pending CN112602091A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862711695P 2018-07-30 2018-07-30
US62/711695 2018-07-30
PCT/US2019/043244 WO2020028116A1 (en) 2018-07-30 2019-07-24 Object detection using multiple neural networks trained for different image fields

Publications (1)

Publication Number Publication Date
CN112602091A true CN112602091A (en) 2021-04-02

Family

ID=69232087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980055920.4A Pending CN112602091A (en) 2018-07-30 2019-07-24 Object detection using multiple neural networks trained for different image fields

Country Status (5)

Country Link
US (1) US20220114807A1 (en)
EP (1) EP3830751A4 (en)
KR (1) KR20210035269A (en)
CN (1) CN112602091A (en)
WO (1) WO2020028116A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023155433A1 (en) * 2022-02-16 2023-08-24 海信视像科技股份有限公司 Video image analysis apparatus and video analysis method
WO2024044887A1 (en) * 2022-08-29 2024-03-07 Huawei Technologies Co., Ltd. Vision-based perception system

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7115502B2 (en) 2020-03-23 2022-08-09 トヨタ自動車株式会社 Object state identification device, object state identification method, computer program for object state identification, and control device
JP7388971B2 (en) 2020-04-06 2023-11-29 トヨタ自動車株式会社 Vehicle control device, vehicle control method, and vehicle control computer program
JP7359735B2 (en) * 2020-04-06 2023-10-11 トヨタ自動車株式会社 Object state identification device, object state identification method, computer program for object state identification, and control device
US11574100B2 (en) * 2020-06-19 2023-02-07 Micron Technology, Inc. Integrated sensor device with deep learning accelerator and random access memory
US20230004760A1 (en) * 2021-06-28 2023-01-05 Nvidia Corporation Training object detection systems with generated images
KR102485099B1 (en) * 2021-12-21 2023-01-05 주식회사 인피닉 Method for data purification using meta data, and computer program recorded on record-medium for executing method therefor
KR20230095505A (en) * 2021-12-22 2023-06-29 경기대학교 산학협력단 Video visual relation detection system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102997900A (en) * 2011-09-15 2013-03-27 歌乐株式会社 Vehicle systems, devices, and methods for recognizing external worlds
CN105404844A (en) * 2014-09-12 2016-03-16 广州汽车集团股份有限公司 Road boundary detection method based on multi-line laser radar
US9672446B1 (en) * 2016-05-06 2017-06-06 Uber Technologies, Inc. Object detection for an autonomous vehicle
CN106934426A (en) * 2015-12-29 2017-07-07 三星电子株式会社 The method and apparatus of the neutral net based on picture signal treatment
CN107122770A (en) * 2017-06-13 2017-09-01 驭势(上海)汽车科技有限公司 Many mesh camera systems, intelligent driving system, automobile, method and storage medium
CN108229277A (en) * 2017-03-31 2018-06-29 北京市商汤科技开发有限公司 Gesture identification, control and neural network training method, device and electronic equipment
CN108334081A (en) * 2017-01-20 2018-07-27 福特全球技术公司 Depth of round convolutional neural networks for object detection

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7841533B2 (en) * 2003-11-13 2010-11-30 Metrologic Instruments, Inc. Method of capturing and processing digital images of an object within the field of view (FOV) of a hand-supportable digitial image capture and processing system
US8165407B1 (en) * 2006-10-06 2012-04-24 Hrl Laboratories, Llc Visual attention and object recognition system
WO2008103929A2 (en) * 2007-02-23 2008-08-28 Johnson Controls Technology Company Video processing systems and methods
US9542626B2 (en) * 2013-09-06 2017-01-10 Toyota Jidosha Kabushiki Kaisha Augmenting layer-based object detection with deep convolutional neural networks
US10564714B2 (en) * 2014-05-09 2020-02-18 Google Llc Systems and methods for biomechanically-based eye signals for interacting with real and virtual objects
US20170206426A1 (en) * 2016-01-15 2017-07-20 Ford Global Technologies, Llc Pedestrian Detection With Saliency Maps
US9760806B1 (en) * 2016-05-11 2017-09-12 TCL Research America Inc. Method and system for vision-centric deep-learning-based road situation analysis
US20190340306A1 (en) * 2017-04-27 2019-11-07 Ecosense Lighting Inc. Methods and systems for an automated design, fulfillment, deployment and operation platform for lighting installations
US10236725B1 (en) * 2017-09-05 2019-03-19 Apple Inc. Wireless charging system with image-processing-based foreign object detection
US11567627B2 (en) * 2018-01-30 2023-01-31 Magic Leap, Inc. Eclipse cursor for virtual content in mixed reality displays
US10769399B2 (en) * 2018-12-18 2020-09-08 Zebra Technologies Corporation Method for improper product barcode detection

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102997900A (en) * 2011-09-15 2013-03-27 歌乐株式会社 Vehicle systems, devices, and methods for recognizing external worlds
CN105404844A (en) * 2014-09-12 2016-03-16 广州汽车集团股份有限公司 Road boundary detection method based on multi-line laser radar
CN106934426A (en) * 2015-12-29 2017-07-07 三星电子株式会社 The method and apparatus of the neutral net based on picture signal treatment
US9672446B1 (en) * 2016-05-06 2017-06-06 Uber Technologies, Inc. Object detection for an autonomous vehicle
CN108334081A (en) * 2017-01-20 2018-07-27 福特全球技术公司 Depth of round convolutional neural networks for object detection
CN108229277A (en) * 2017-03-31 2018-06-29 北京市商汤科技开发有限公司 Gesture identification, control and neural network training method, device and electronic equipment
CN107122770A (en) * 2017-06-13 2017-09-01 驭势(上海)汽车科技有限公司 Many mesh camera systems, intelligent driving system, automobile, method and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023155433A1 (en) * 2022-02-16 2023-08-24 海信视像科技股份有限公司 Video image analysis apparatus and video analysis method
WO2024044887A1 (en) * 2022-08-29 2024-03-07 Huawei Technologies Co., Ltd. Vision-based perception system

Also Published As

Publication number Publication date
EP3830751A1 (en) 2021-06-09
KR20210035269A (en) 2021-03-31
US20220114807A1 (en) 2022-04-14
WO2020028116A1 (en) 2020-02-06
EP3830751A4 (en) 2022-05-04

Similar Documents

Publication Publication Date Title
CN112602091A (en) Object detection using multiple neural networks trained for different image fields
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
US11195038B2 (en) Device and a method for extracting dynamic information on a scene using a convolutional neural network
Kopsiaftis et al. Vehicle detection and traffic density monitoring from very high resolution satellite video data
US10929955B2 (en) Scene-based nonuniformity correction using a convolutional recurrent neural network
Pei et al. A fast RetinaNet fusion framework for multi-spectral pedestrian detection
JP6509027B2 (en) Object tracking device, optical apparatus, imaging device, control method of object tracking device, program
JP6998554B2 (en) Image generator and image generation method
CN111402130B (en) Data processing method and data processing device
US11908160B2 (en) Method and apparatus for context-embedding and region-based object detection
CN111222395A (en) Target detection method and device and electronic equipment
CN114556268B (en) Gesture recognition method and device and storage medium
Bu et al. Pedestrian planar LiDAR pose (PPLP) network for oriented pedestrian detection based on planar LiDAR and monocular images
CN112639819A (en) Object detection using multiple sensors and reduced complexity neural networks
US11804026B2 (en) Device and a method for processing data sequences using a convolutional neural network
Lyu et al. Road segmentation using CNN with GRU
CN115239581A (en) Image processing method and related device
Zhang et al. Efficient object detection method based on aerial optical sensors for remote sensing
Zuo et al. Accurate depth estimation from a hybrid event-RGB stereo setup
CN102044079B (en) Apparatus and method for tracking image patch in consideration of scale
Zhang et al. CE-RetinaNet: A channel enhancement method for infrared wildlife detection in UAV images
Umamaheswaran et al. Stereo vision based speed estimation for autonomous driving
CN116433712A (en) Fusion tracking method and device based on pre-fusion of multi-sensor time sequence sensing results
CN114612999A (en) Target behavior classification method, storage medium and terminal
CN114842012B (en) Medical image small target detection method and device based on position awareness U-shaped network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination