CN115131756A - Target detection method and device - Google Patents

Target detection method and device Download PDF

Info

Publication number
CN115131756A
CN115131756A CN202210742731.8A CN202210742731A CN115131756A CN 115131756 A CN115131756 A CN 115131756A CN 202210742731 A CN202210742731 A CN 202210742731A CN 115131756 A CN115131756 A CN 115131756A
Authority
CN
China
Prior art keywords
image
feature
target
region
radar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210742731.8A
Other languages
Chinese (zh)
Inventor
方梓成
张经纬
赵显�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Goldway Intelligent Transportation System Co Ltd
Original Assignee
Shanghai Goldway Intelligent Transportation System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Goldway Intelligent Transportation System Co Ltd filed Critical Shanghai Goldway Intelligent Transportation System Co Ltd
Priority to CN202210742731.8A priority Critical patent/CN115131756A/en
Publication of CN115131756A publication Critical patent/CN115131756A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Radar Systems Or Details Thereof (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Electromagnetism (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Geometry (AREA)
  • Optics & Photonics (AREA)
  • Life Sciences & Earth Sciences (AREA)

Abstract

The application provides a target detection method and a target detection device, relates to the technical field of unmanned driving, and can improve the accuracy of target detection. The specific scheme comprises: acquiring a visual image and point cloud data of a region to be detected, wherein the point cloud data comprises one or more radar reflection points; associating one or more image features extracted based on the visual image with the point cloud data to obtain one or more radar reflection points associated with each image feature; determining a radar feature corresponding to each image feature according to one or more radar reflection points associated with each image feature; and detecting the space state of a target object in the region to be detected according to each image characteristic and the radar characteristic corresponding to each image characteristic, wherein the target object comprises a human body and/or an object.

Description

Target detection method and device
Technical Field
The application relates to the technical field of unmanned driving, in particular to a target detection method and device.
Background
A Self-driving automobile (also called as an auto-driving automobile, a computer-driven automobile, or a wheeled mobile robot), which is an intelligent automobile that can realize unmanned driving through a computer system. For unmanned vehicles, target detection on the driving road environment is a key technical link for realizing automatic driving perception of the vehicles. The vehicle driving sensing system is used for determining the surrounding environment of the vehicle, avoiding collision with other vehicles, pedestrians and other targets, and further realizing safe automatic driving of the vehicle on the lane.
The above-described object detection technology is a technology for recognizing an object such as another vehicle or a pedestrian in the surrounding environment by a sensor carried by the unmanned vehicle. The unmanned vehicle may carry one or more sensors for operation. Under a complex working environment, information acquired by a single sensor has limitation, and correct detection of a target under various conditions cannot be guaranteed only by the single sensor, so that wrong environment perception of an unmanned automobile can be caused. Therefore, the defects of the respective sensors are usually compensated by means of multi-sensor fusion, so that the target detection is performed more accurately and rapidly, and the environment around the vehicle is sensed. Among them, a technology based on the fusion of a camera and a millimeter wave radar is one of important technologies for automatic driving.
At present, in a scheme based on the fusion of a camera and a millimeter wave radar, a projection of the millimeter wave radar on a camera image is mostly used for generating a to-be-detected area, and then two-dimensional target detection is performed on an image coordinate system. However, the process of converting the 3D point cloud information of the object received by the millimeter wave radar into the 2D plane image consistent with the vision has a large error, and thus the accuracy of target detection is affected.
Disclosure of Invention
The application provides a target detection method and a target detection device, which can be used for improving the accuracy of target detection.
In order to achieve the technical purpose, the following technical scheme is adopted in the application:
in a first aspect, an embodiment of the present application provides a target detection method, where the method includes: acquiring a visual image and point cloud data of a region to be detected, wherein the point cloud data comprises one or more radar reflection points; associating one or more image features extracted based on the visual image with the point cloud data to obtain one or more radar reflection points associated with each image feature; determining a radar feature corresponding to each image feature according to one or more radar reflection points associated with each image feature; and detecting the space state of a target object in the region to be detected according to each image characteristic and the radar characteristic corresponding to each image characteristic, wherein the target object comprises a human body and/or an object.
Based on the method, the radar features corresponding to the image features are generated by performing feature association on the visual image and the point cloud data and then performing feature extraction on the point cloud data according to the association result. And finally, fusing the two features to detect the space state of the target object in the region to be detected. Therefore, on one hand, in the processing process of the visual image and the point cloud data, the three-dimensional space information embodied by the radar point cloud data is not lost, the image characteristics can be associated with one or more radar reflection points, and the influence of the isolated abnormal value on the associated result can be reduced; therefore, accuracy of cross-modal sensor data association between the visual image and the point cloud data can be improved. On the other hand, the three-dimensional space state of the target object can be estimated while the target object in the region to be detected is detected.
In a possible implementation manner, the associating one or more image features extracted based on the visual image with the point cloud data to obtain one or more radar reflection points associated with each image feature includes: for a first image feature of the one or more image features, determining an association probability of the first image feature with each radar reflection point according to the first image feature and point cloud data; and determining the radar reflection points with the association probability larger than or equal to the preset probability as the radar reflection points associated with the first image characteristics. It can be understood that in this implementation manner, the association probability between the image feature and each radar reflection point can be determined based on the spatial position information between the image feature and the point cloud data, and then whether the image feature is associated with the radar reflection point is determined based on the association probability, so that the accuracy of the determination process can be improved.
In another possible implementation manner, the determining, according to the first image feature and the point cloud data, the association probability of the first image feature with each radar reflection point according to the image feature and the point cloud data, and determining one or more radar reflection points associated with any image feature according to the image feature and the point cloud data includes: determining the relative position relation between the first image characteristic and each radar reflection point in the point cloud data; determining depth information of any first image feature and depth information of each radar reflection point in the point cloud data; determining the association probability of the first arbitrary image feature and each radar reflection point according to the relative position relation, the depth information of the first arbitrary image feature and the depth information of each radar reflection point in the point cloud data; and determining the radar reflection points with the association probability greater than or equal to the preset probability as the radar reflection points associated with any image feature.
In yet another possible implementation, the association probability satisfies the following relationship:
Figure BDA0003718607190000021
where d is depth information of radar reflection point, mu d Is the depth mean, σ, in the depth information of the first image feature d The view frustum is used for indicating a space region determined based on the positions of the target image frame corresponding to the first image feature and the image acquisition device.
In another possible implementation manner, the detecting a spatial state of the target object in the region to be detected according to each image feature and the radar feature corresponding to each image feature includes: for a first image feature of the one or more image features, determining a joint feature of the first image feature and a radar feature corresponding to the first image feature; and determining the space state of the target object in the region to be detected according to the first image characteristic and the joint characteristic.
It can be understood that in the implementation manner, the fusion feature of the image feature and the radar feature is determined at first, and then the target detection is performed based on the fusion feature, so that the relevance between the visual image and the point cloud data on the spatial position can be fully utilized, and the accuracy of the target detection is improved.
In another possible implementation manner, the determining the spatial state of the target object in the region to be detected according to the first image feature and the joint feature includes: determining one or more items of target categories, orientation angles, sizes and central offsets of target objects in the region to be detected according to the first image characteristics; determining speed information and direction information of a target object in the region to be detected according to the joint characteristics; determining the depth information of a target object in the region to be detected according to the first image characteristic and the joint characteristic; and determining the space state of the target object according to one or more of the target category, the orientation angle, the size, the center offset, the speed information, the direction information and the depth information of the target object in the region to be measured.
It is to be understood that, in this implementation, the target detection apparatus may perform group prediction on prediction information of the target object to be predicted, for example, image features extracted based on the visual image may be used to predict information such as a target category, an orientation angle, a size, a center offset, and the like of the target object. And based on the joint feature of the image feature and the radar data, speed information and direction information, etc. on the prediction target object may be made. Therefore, the advantages of the collected data of the image collecting equipment and the collected data of the radar can be fully exerted, and the accuracy of prediction can be improved.
In yet another possible implementation, determining one or more of a target class, an orientation angle, a size, and a center offset of a target object within a region to be measured according to the first image feature includes: inputting the first image characteristics into a first prediction module to obtain one or more items of target categories, orientation angles, sizes and central offsets of target objects in the region to be detected; the first prediction module comprises one or more first sub-modules, one first sub-module comprises a 3x3 convolution kernel, a linear rectification unit and a 1x1 convolution kernel, and the one first sub-module is used for outputting the target class, the orientation angle, the size or the central offset of a target object in the region to be detected.
In another possible implementation manner, the determining the speed information and the direction information of the target object in the region to be measured according to the joint feature includes: inputting the joint characteristics into a second prediction module to obtain speed information and direction information of a target object in the region to be detected; the second prediction module comprises a 1 × 1 convolution kernel, a linear rectification unit, a 1 × 1 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel.
In another possible implementation manner, the determining depth information of the target object in the region to be measured according to the first image feature and the joint feature includes: inputting the first image characteristic and the joint characteristic into a third prediction module to obtain depth information of a target object in the region to be detected; the third prediction module comprises one or more first sub-modules and second sub-modules, the first sub-modules comprise 3x3 convolution kernels, linear rectification units and 1x1 convolution kernels, and the second sub-modules comprise 1x1 convolution kernels, linear rectification units, 1x1 convolution kernels, linear rectification units and 1x1 convolution kernels.
In a second aspect, an embodiment of the present application provides a target detection apparatus, which includes an obtaining module, a feature associating module, a feature extracting module, and a target detection module, where the obtaining module is configured to obtain a visual image and point cloud data of a region to be detected; the system comprises a feature association module, a point cloud data acquisition module and a data processing module, wherein the feature association module is used for associating one or more image features extracted based on a visual image with the point cloud data to obtain one or more radar reflection points associated with each image feature; the feature extraction module is used for determining radar features corresponding to each image feature according to one or more radar reflection points associated with each image feature; and the target detection module is used for detecting the space state of a target object in the region to be detected according to each image characteristic and the radar characteristic corresponding to each image characteristic, wherein the target object comprises a human body and/or an object.
In a possible implementation manner, the feature association module is specifically configured to: for a first image feature of the one or more image features, determining an association probability of the first image feature with each radar reflection point according to the first image feature and point cloud data; and determining the radar reflection points with the association probability greater than or equal to the preset probability as the radar reflection points associated with the first image characteristics.
In another possible implementation manner, the feature association module is specifically configured to: determining the relative position relation between the first image characteristic and each radar reflection point in the point cloud data; determining depth information of the first image feature and depth information of each radar reflection point in the point cloud data; and determining the association probability of the first image characteristic and each radar reflection point according to the relative position relation, the depth information of the first image characteristic and the depth information of each radar reflection point in the point cloud data.
In yet another possible implementation, the association probability satisfies the following relationship:
Figure BDA0003718607190000031
where d is depth information of radar reflection point, mu d Is the depth mean, σ, in the depth information of the first image feature d The view frustum is used for indicating a space region determined based on the positions of the target image frame corresponding to the first image feature and the image acquisition device.
In another possible implementation manner, the object detection module is specifically configured to: and for a first image feature in the one or more image features, determining a joint feature of the first image feature and a radar feature corresponding to the first image feature, and determining a spatial state of a target object in the region to be detected according to the first image feature and the joint feature.
In another possible implementation manner, the object detection module is specifically configured to: determining one or more items of target categories, orientation angles, sizes and central offsets of target objects in the region to be detected according to the first image characteristics; determining the speed information and the direction information of the target object in the region to be detected according to the joint characteristics; determining the depth information of a target object in the region to be detected according to the first image characteristic and the joint characteristic; and determining the space state of the target object according to one or more items of the target class, the orientation angle, the size, the central offset, the speed information, the direction information and the depth information of the target object in the region to be measured.
In another possible implementation manner, the object detection module is specifically configured to: inputting the first image characteristics into a first prediction module to obtain one or more items of target categories, orientation angles, sizes and central offsets of target objects in the region to be detected; the first prediction module comprises one or more first sub-modules, one first sub-module comprises a 3x3 convolution kernel, a linear rectification unit and a 1x1 convolution kernel, and the first sub-module is used for outputting the target category, the orientation angle, the size or the central offset of a target object in a region to be detected.
In another possible implementation manner, the object detection module is further specifically configured to: inputting the joint characteristics into a second prediction module to obtain speed information and direction information of a target object in the region to be detected; the second prediction module comprises a 1 × 1 convolution kernel, a linear rectification unit, a 1 × 1 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel.
In another possible implementation manner, the object detection module is further specifically configured to: inputting the first image characteristic and the joint characteristic into a third prediction module to obtain depth information of a target object in the region to be detected; the third prediction module comprises one or more first sub-modules and second sub-modules, the first sub-modules comprise 3x3 convolution kernels, linear rectification units and 1x1 convolution kernels, and the second sub-modules comprise 1x1 convolution kernels, linear rectification units, 1x1 convolution kernels, linear rectification units and 1x1 convolution kernels.
In a third aspect, the present application provides an electronic device comprising a memory and a processor. The memory is coupled to the processor. The memory is for storing computer program code comprising computer instructions. The computer instructions, when executed by a processor, cause an electronic device to perform a method of object detection as set forth in the first aspect and any of its possible designs.
In a fourth aspect, the present application provides a chip system, which is applied to a target detection apparatus; the chip system includes one or more interface circuits, and one or more processors. The interface circuit and the processor are interconnected through a line; the interface circuit is configured to receive signals from the memory of the object detection device and to send signals to the processor, the signals including computer instructions stored in the memory. The computer instructions, when executed by the processor, cause the electronic device to perform a method of object detection as set forth in the first aspect and any of its possible designs.
In a fifth aspect, the present application provides a computer-readable storage medium storing computer instructions that, when executed on an electronic device, cause the electronic device to perform the object detection method according to the first aspect and any one of its possible design manners.
In a sixth aspect, the present application provides a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the object detection method according to the first aspect and any one of its possible design approaches.
Reference may be made in detail to the second to sixth aspects and various implementations of the first aspect in this application; moreover, for the beneficial effects of the second aspect to the sixth aspect and various implementation manners thereof, reference may be made to beneficial effect analysis in the first aspect and various implementation manners thereof, and details are not described here.
Drawings
Fig. 1 is a schematic composition diagram of a target detection system according to an embodiment of the present disclosure;
fig. 2 is a schematic view of a target detection scenario provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a target detection method according to an embodiment of the present application;
FIG. 5 is a flow chart of another method for detecting an object according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a residual error network according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating a combination module of image features and point cloud data according to an embodiment of the present disclosure;
FIG. 8 is a schematic view of a view frustum provided in an embodiment of the present application;
FIG. 9 is a flow chart of another method for detecting an object according to an embodiment of the present disclosure;
fig. 10 is a schematic diagram illustrating a target detection module according to an embodiment of the present disclosure;
FIG. 11 is a block diagram of a prediction module according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of this application, "/" means "or" unless otherwise stated, for example, A/B may mean A or B. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. Further, "at least one" means one or more, "a plurality" means two or more. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily limit the difference.
It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.
For the sake of understanding, the basic concepts of some terms or techniques related to the embodiments of the present invention will be briefly described and explained.
1. Target detection
Object detection refers to the process of finding an object from a scene (e.g., an image) and determining the location of the object. Object detection has wide application in many fields of life, such as the fields of automatic driving, driving assistance and early warning, etc. In the process of target detection and identification, multi-sensor fusion is usually required, for example, data collected by a laser radar, a millimeter wave radar, a vision sensor, an infrared sensor, etc. are fused to obtain vehicle surrounding environment information, that is, detection of a target object in the vehicle surrounding environment. Illustratively, the target object may be any object in the vehicle surroundings, such as a vehicle, a person, a tree, a building, and so on.
2. Convolutional Neural Networks (CNN)
Convolutional Neural Networks are a class of feed-forward Neural Networks (fed-forward Neural Networks) containing convolutional calculations and having a deep structure, and are one of the representative algorithms for deep learning. The convolutional neural network can be applied to computer vision such as image recognition (image classification), object recognition (object recognition), action recognition (action recognition), pose estimation (position estimation), and neural style conversion (neural style transfer), and can also be applied to NLP (natural language processing).
In general, a convolutional neural network includes an input layer, a hidden layer, and an output layer.
Wherein the input layer of the convolutional neural network can process multidimensional data. Taking image processing as an example, the input layer may receive pixel values (three-dimensional arrays) of an image, that is, values of two-dimensional pixels and RGB channels on a plane.
The hidden layers of the convolutional neural network include one or more convolutional layers (convolutional layers), one or more pooling layers (pooling layers), and one or more fully-connected layers (fully-connected layers). The convolutional layer functions to perform feature extraction on input data. The convolutional layer is typically connected to the pooling layer after which the output data is passed to the pooling layer for selection and information filtering after feature extraction by the convolutional layer. Each node of the fully-connected layer is connected with all nodes of the previous layer and used for integrating the acquired features, and the fully-connected layer plays a role of a classifier in the whole convolutional neural network.
The structure and the working principle of the output layer of the convolutional neural network are the same as the output of the traditional feedforward neural network. For example, for a graph-sorted convolutional neural network, the output layer outputs a sort label using a logistic function or normalized exponential function (softmax function), such as: people, scenery, objects, etc.
3. Radar apparatus
Radar is a device that can detect an object using electromagnetic waves. The transmitting antenna of the radar converts a high-frequency current signal generated by a transmitter circuit or a guided wave on a transmission line into an electromagnetic wave which can be transmitted in space and has a certain specific polarization mode to be transmitted along a preset direction, and when the electromagnetic wave meets an obstacle in the advancing direction, part of the electromagnetic wave can be reflected back along the opposite direction of the transmitting direction. At this time, the receiving antenna of the radar may receive the reflected electromagnetic wave, convert it into a high-frequency current signal or a transmission line guided wave, and extract state information such as a distance, a speed, and an angle of the target by performing subsequent processing on the obtained echo signal.
Illustratively, the radar may be comprised of a radar transmitter, a radar receiver, and an antenna.
A radar transmitter is a radio device that provides a high-power radio frequency signal for a radar, and can generate a high-power radio frequency signal, i.e., an electromagnetic wave, whose carrier is modulated. Depending on the modulation scheme, transmitters can be classified into continuous wave transmitters and pulse transmitters. The transmitter consists of a primary radio frequency oscillator and a pulse modulator.
The radar receiver is a device for frequency conversion, filtering, amplification and demodulation in radar. The weak high frequency signals received by the antenna are selected from the accompanying noise and interference by appropriate filtering, and after amplification and detection, are used for target detection, display or other radar signal processing.
An antenna is a device used in radar equipment to transmit or receive electromagnetic waves and determine the detection direction thereof. When in emission, the energy is intensively radiated to the direction needing to be irradiated; during reception, echoes of the probe direction are received and the azimuth and/or elevation of the target is resolved.
4. Point cloud data
The point cloud data is a massive point set which expresses target space distribution and target surface characteristics under the same spatial reference system, and after the spatial position of each reflection point in a region to be detected or on the surface of an object to be detected is obtained, a point set is obtained and is called as point cloud. Wherein the spatial reference system may be referred to as a radar coordinate system.
The above is an introduction of technical terms involved in the embodiments of the present disclosure, and the details are not described below.
As described in the background, the target detection based on the fusion of the camera and the millimeter wave radar and the surrounding environment has become one of the common modes of automatic driving. The millimeter wave radar is long in detection distance and high in detection accuracy, can reflect the space three-dimensional information of the environment, but is deficient in detail description, and the camera of the visual sensor (such as a camera) cannot reflect the space three-dimensional information of the environment, but has a prominent effect in detail and color description, so that the disadvantage of the millimeter wave radar in object recognition is overcome. The current fusion technology based on the camera and the millimeter wave radar has larger error, thereby leading to lower accuracy of target detection.
In view of the above, the present application provides a target detection method, which includes: firstly, acquiring a visual image and point cloud data of a region to be detected; performing feature extraction on the obtained visual image to obtain one or more image features; determining radar features corresponding to each image feature according to one or more image features and point cloud data; and detecting the space state of a target object in the region to be detected according to each image characteristic and the radar characteristic corresponding to each image characteristic, wherein the target object comprises a human body and/or an object. Therefore, in the process of processing the visual image and the point cloud data, the three-dimensional space information embodied by the radar point cloud data is not lost, so that the accuracy of cross-modal sensor data association between the visual image and the point cloud data can be improved.
The method provided by the embodiment of the application can be applied to an unmanned aerial vehicle navigation system, an unmanned system, a driving auxiliary early warning system and the like, for example, pedestrian collision early warning (PCW), forward collision early warning (FCW), lane departure early warning (LDW), vehicle distance detection and warning (HMW) and the like are included.
As shown in fig. 1, an embodiment of the present application provides a schematic diagram of an object detection system. The object detection system 100 includes: an image acquisition device 10, a radar 20, and a control device 30. Wherein the image acquisition device 10 and the radar 20 are connected to the control device 30, respectively.
It should be understood that the above-mentioned connection mode may be a wireless connection, such as a bluetooth connection, a Wi-Fi connection, etc.; alternatively, the connection may be a wired connection, for example, an optical fiber connection, and the like, but is not limited thereto.
The image capturing device 10 is any one of a camera, a video camera, a scanner, or other devices with a photographing function (e.g., a mobile phone, a tablet computer, etc.).
Alternatively, the image pickup apparatus 10 may include a LENS (LENS) and an image sensor. Things in the field of view of the image acquisition device 10 are projected onto an image sensor through an optical image generated by a LENS (LENS), the image sensor converts the optical image into an electric signal, and a visual image of the region to be measured is obtained after the processing processes such as analog-to-digital (a/D) conversion and the like.
The radar 20 may be any one or combination of laser radar sensors, millimeter wave radar sensors, and the like. The radar 20 may detect a target in the region to be detected by using electromagnetic waves, so as to obtain radar point cloud data.
For example, as shown in fig. 2, a radar disposed on a vehicle a may emit a probe electromagnetic wave to an obstacle B and receive an echo signal generated by the obstacle B reflecting the probe electromagnetic wave, that is, the above-mentioned radar point cloud data.
In some embodiments of the application, a millimeter wave radar with strong anti-interference capability, strong resolving capability and high measurement precision is adopted.
The millimeter wave radar is a radar working in a millimeter wave band (millimeter wave), and can transmit signals with the wavelength of 1-10mm and the frequency of 30-300 GHZ. In the electromagnetic spectrum, such wavelengths are considered short wavelengths, which means high accuracy. Illustratively, a millimeter wave system operating at a frequency of 76-81GHz (corresponding to a wavelength of about 4mm) will be able to detect movements as small as a fraction of a millimeter.
The control device 30 is configured to process the visual image of the region to be detected and the point cloud data by using the target detection method of the embodiment of the present application, so as to detect spatial information of the target object in the region to be detected, where the spatial information may be used to reflect a position and/or a category attribute of the target. The target detection method of the embodiment of the application can improve the accuracy of target detection, and the specific implementation manner of the method can be referred to the explanation of the method embodiment described below.
In a possible implementation, the image acquisition device 10, the radar 20 and the control device 30 may be provided on the same device, for example, a vehicle, a drone, or the like.
In another possible implementation manner, the image capturing device 10 and the radar 20 may be disposed on the same device, for example, a vehicle, an unmanned aerial vehicle, and the like, the control device 30 may be a terminal device connected to the image capturing device 10 and the radar 20, respectively, and the control device 30 may communicate with the device on which the image capturing device 10 and the radar 20 are disposed to obtain the visual image and the point cloud data of the region to be measured.
The terminal device in the embodiment of the present application may be, for example, a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) \ Virtual Reality (VR) device, an intelligent remote controller, and the like. The system can be used for man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like. Also, in the embodiment of the present application, the terminal device may have an interface for communicating with a cellular network and/or a Wireless Local Area Network (WLAN).
The embodiment of the present application further provides a target detection device, which is an execution main body of the target detection control method. The object detection device has an electronic device with data processing capability. For example, the object detection device may be the control device 30 in the object detection system 100, or the object detection device may be a functional module in the control device 30, or the object detection device may be any computing device connected to the control device 30, and so on. The embodiments of the present application do not limit this.
A hardware configuration of the object detection apparatus 200 will be described with reference to fig. 3.
As shown in FIG. 3, the object detection device 200 includes a processor 210, a communication link 220, and a communication interface 230.
Optionally, the object detection apparatus 200 may further include a memory 240. The processor 210, the memory 240 and the communication interface 230 may be connected via a communication line 220.
The processor 210 may be a Central Processing Unit (CPU), a general purpose processor Network (NP), a Digital Signal Processor (DSP), a microprocessor, a microcontroller, a Programmable Logic Device (PLD), or any combination thereof. The processor 210 may also be any other device with processing function, such as a circuit, a device, or a software module, without limitation.
In one example, processor 210 may include one or more CPUs, such as CPU0 and CPU1 in fig. 3.
As an alternative implementation, the object detection apparatus 200 includes a plurality of processors, for example, the processor 270 may be included in addition to the processor 210. A communication line 220 for transmitting information between the respective components included in the object detection apparatus 200.
A communication interface 230 for communicating with other devices or other communication networks. The other communication network may be an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), or the like. Communication interface 230 may be a module, circuitry, transceiver, or any device capable of enabling communication.
A memory 240 for storing instructions. Wherein the instructions may be a computer program.
The memory 240 may be a read-only memory (ROM) or another type of static storage device that can store static information and/or instructions, an access memory (RAM) or another type of dynamic storage device that can store information and/or instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or another optical disc storage, an optical disc storage (including a compact disc, a laser disc, an optical disc, a digital versatile disc, a blu-ray disc, etc.), a magnetic disc storage medium or another magnetic storage device, and the like, without limitation.
It should be noted that the memory 240 may exist independently from the processor 210 or may be integrated with the processor 210. The memory 240 may be used for storing instructions or program code or some data or the like. The memory 240 may be located inside the object detection apparatus 200 or outside the object detection apparatus 200, which is not limited.
The processor 210 is configured to execute the instructions stored in the memory 240 to implement the communication method provided in the embodiments described below. For example, when the object detection apparatus 200 is a terminal or a chip or a system on a chip in a terminal, the processor 210 may execute instructions stored in the memory 240 to implement the object detection apparatus provided in the present application.
As an alternative implementation, the object detection apparatus 200 further includes an output device 250 and an input device 260. The output device 250 may be a display screen, a speaker, or the like capable of outputting data of the object detection apparatus 200 to a user. The input device 260 may be a device capable of inputting data to the object detection apparatus 200, such as a keyboard, a mouse, a microphone, or a joystick.
It is noted that the structure shown in fig. 3 does not constitute a limitation of the control device, which may comprise more or less components than those shown in fig. 3, or a combination of some components, or a different arrangement of components than those shown in fig. 3.
The following describes the target detection method provided by the present application in detail with reference to the drawings of the specification.
As shown in fig. 4, an embodiment of the present application provides an object detection method, which is optionally performed by the object detection apparatus shown in fig. 2, and includes the following steps:
s101, the target detection device obtains a visual image and point cloud data of a region to be detected.
The region to be detected is a region that can be detected by the target detection system shown in fig. 1. This area is within the area where the image capturing device 10 is able to capture pictures and, likewise, within the area that the radar 20 is able to detect.
For example, in the target detection system of an unmanned vehicle, the target detection system in the vehicle continuously detects obstacles around the vehicle to assist the vehicle to drive and sense the surrounding environment, so that the unmanned vehicle can safely and automatically drive on a lane. In the object detection system, the area to be detected is an area in which object detection is being performed in the surrounding environment of the vehicle. Moreover, the visual image may be obtained by image acquisition of the area to be detected by image acquisition equipment mounted on the vehicle, and the image acquisition equipment may send the obtained visual image to the target detection device of this embodiment. The point cloud data is data obtained by detecting an area to be detected by a radar installed on the vehicle, and the radar can also send the obtained point cloud data to the target detection device of the embodiment.
In practical use, the radar usually periodically emits electromagnetic waves, and when the electromagnetic waves encounter an obstacle in the forward direction, part of the electromagnetic waves are reflected back in the direction opposite to the emission direction. The electromagnetic waves reflected by the target object in the region to be detected are echo signals received by the radar, and the required radar point cloud data can be obtained through signal processing. Optionally, the signal processing on the echo signal includes Fast Fourier Transform (FFT), Constant False Alarm Rate (CFAR), and the like.
The target object refers to any target such as a vehicle, a person, a tree and the like in the region to be detected. Each frame of point cloud data acquired by the radar comprises a plurality of radar reflection point data. Further, one radar reflection point data may include three-dimensional coordinate information of the radar reflection point. Alternatively, one radar reflection point data may include two-dimensional coordinate information (i.e., plane coordinate information) of the radar reflection point. Optionally, the radar reflection point data may further include speed information, a reflection cross-sectional area, a signal-to-noise ratio of a reflection signal, time information, reflection intensity information, and the like. Before the fusion processing, for the three-dimensional space coordinate of each radar reflection point, the coordinate system needs to be converted into an image acquisition equipment coordinate system by using the external parameters of calibrated radar equipment and image acquisition equipment, and the coordinate system is expressed as (x, y, z), wherein the z-axis direction is also called as depth d.
Optionally, the point cloud data and the visual image of the region to be detected may be synchronously acquired data.
It should be noted that the synchronous acquisition may be understood that the radar device 20 and the image acquisition device 10 in the target detection system 100 acquire data simultaneously, or may be understood that a deviation of frame rates of the radar device 20 and the image acquisition device 10 acquiring data is within a preset range. For example, the radar periodically acquires point cloud data according to a first frame rate, the image acquisition device periodically acquires a visual image according to a second frame rate, and the point cloud data and the visual image can be considered as synchronously acquired data when a difference value between the first frame rate and the second frame rate is smaller than a preset frame rate threshold.
S102, the target detection device associates one or more image features extracted based on the visual image with the point cloud data to obtain one or more radar reflection points associated with each image feature.
In some embodiments, the target detection device may perform feature extraction on the visual image to obtain one or more image features. And then, associating each image feature with the point cloud data to obtain one or more radar reflection points associated with each image feature.
Optionally, as shown in fig. 5, the step S102 may specifically include S1021 and S1022:
and S1021, performing feature extraction on the vision by the target detection device to obtain one or more image features.
The feature extraction refers to a method and a process for extracting characteristic information from an image. For example, the object detection device may extract the characteristic information in the visual image by analyzing and transforming the visual image.
The extracted characteristic information may include low-level information such as an outline and a color of the target object, and may further include high-level characteristic information of complete information of the target object.
As a possible implementation manner, the target detection apparatus may input the visual image into a pre-trained feature extraction model for feature extraction, so as to obtain one or more image features of the visual image.
Optionally, the feature extraction model may adopt a convolutional neural network to perform feature extraction on the acquired visual image. The feature extraction model can adopt a coder-decoder structure, the first half part of the coder adopts down-sampling operation, and the visual image is subjected to primary feature extraction through a plurality of feature extraction layers to obtain feature information of different layers. And the latter half decoder performs splicing and fusion on the feature information of different layers by adopting an upsampling operation, so as to obtain the feature information after fusion processing.
Alternatively, the first half of the encoder of the feature extraction model may be implemented by using a residual neural network (ResNet).
The residual error network is one of the neural networks, and the residual error network includes a jump Connection or a shortcut Connection (Short-cut Connection), that is, a straight communication channel is added in the neural network, and the input original image information is directly transmitted to the subsequent network layer, so that the subsequent network layer can directly learn the residual error output by the previous network layer without learning the whole image information output by the previous network layer. In this way, the introduction of residual network connections can avoid the gradient vanishing phenomenon in the network and can speed up training.
Fig. 6 shows a schematic diagram of a residual network comprising five convolution modules, as described in fig. 6. The first convolution module, i.e., convolution module 1 in fig. 6, performs convolution on the input visual image using 64 convolution filters with a convolution kernel of 2, 7 × 7 step length, thereby obtaining 1/2 original visual image size and image features with 64 channel numbers. The second convolution module, i.e. convolution module 2, performs pooling on the feature map output by convolution module 1 to obtain 1/4 original feature image sizes and image features with 64 channel numbers, and then sequentially inputs three consecutive identical residual blocks to obtain 1/4 original visual image sizes and image features with 256 channel numbers. The structure of the residual block is composed of 64 1x1 convolution kernels, 64 3x3 convolution kernels and 256 1x1 convolution kernels, and the output of the final residual block is obtained after three times of convolution. The third convolution module, i.e. convolution module 3, has the same structure as convolution module 2, and includes four residual blocks with numbers of three sets of convolutions of 128, 128 and 512, respectively, so as to obtain 1/8 original feature image size and feature image with number of channels of 512. The fourth convolution module, i.e. convolution module 4, has the same structure as convolution module 2, wherein the residual block includes three sets of convolutions with numbers of 256, 256, and 1024, respectively, for 23 residual blocks and 69 convolutional layers. The convolution module 4 convolutes the feature map output by the convolution module 3 to finally obtain 1/16 original visual image features with the channel number of 1024. The fifth convolution module, i.e. convolution module 5, adopts the same structure as convolution module 2, wherein the residual block includes three groups of convolutions with numbers of 512, 512, 2048, total 3 residual blocks, and 9 convolution layers. The convolution module 5 convolutes the feature map output by the convolution module 4 to finally obtain 1/32 image features with the original visual image size and the channel number of 2048.
Further, the second half decoder of the feature extraction model may include three deconvolution layers, and performs upsampling processing on the obtained 1/32 original feature image size feature map with the number of channels being 2048. Alternatively, the step size of the upsampling may be 2, that is, the amplification factor used for the upsampling is 2. Therefore, the original 1/4 feature image size obtained through three times of upsampling has an image feature with 256 channels. The image features can be expressed as
Figure BDA0003718607190000101
Wherein, W f =W img /4,H f =H img /4,W img Width of original visual image, H img Is the height of the original visual image.
Alternatively, the three deconvolution layers of the decoder may be a combination of 1 × 1 convolution kernels, 3 × 3 convolution kernels, and 1 × 1 convolution kernels.
S1022, the target detection device determines one or more radar reflection point data in the point cloud data associated with each image feature.
It should be noted that, according to the description of the visual image of the to-be-detected region and the point cloud data in step S101, the target object in the to-be-detected region reflects the electromagnetic wave emitted by the radar, so that the point cloud data received by the radar is obtained. The visual image of the region to be detected also includes an image of the target object in the region to be detected, and the image features obtained by feature extraction of the visual image can also embody the visual features of the target object in the region to be detected. Thus, one image feature may indicate the same location of the same target object as one or more radar reflection point data in the point cloud data. It should be understood that the one image feature described above may be associated with one or more radar reflection point data in the point cloud data.
Thus, to determine the radar feature to which each image feature corresponds, the target detection apparatus may first determine one or more radar reflection point data in the point cloud data associated with each image feature.
In some embodiments, for each extracted image feature, the target detection device determines one or more radar reflection points in the point cloud data associated with the image feature from the image feature and the point cloud data.
Alternatively, the target detection apparatus may perform the following steps S11 to S14 to determine one or more radar-reflecting points associated with each image feature.
S11, the target detection device determines the relative position relation between the image characteristics and each radar reflection point in the point cloud data.
Alternatively, as shown in fig. 7 (a), the target detection apparatus may first perform a preliminary process on the image feature through the first processing module 71 to generate a target image frame corresponding to the image feature.
The target image frame corresponding to each image feature may include a 4-dimensional vector, which is the abscissa and ordinate of the upper left point of the rectangular frame corresponding to the image feature in the visual image coordinate system, and the width and height of the rectangular frame, respectively.
Alternatively, as shown in (b) of fig. 7, the first processing module 71 may have a structure of a combination of a 3 × 3 convolution kernel, a Linear rectification Unit (ReLU) Unit, and a 1 × 1 convolution kernel.
Among them, the ReLU unit is also called a modified linear unit, and the ReLU function adopted by the unit is an activation function commonly used in artificial neural networks, usually refers to a nonlinear function represented by a variation of a ramp function, belongs to a nonlinear activation function, and can simulate a more accurate activation model of brain neuron receiving signals from a biological perspective.
Furthermore, based on the target image frame, the position of the image acquisition device and the position of the radar reflection point corresponding to the image feature, the target detection device may sequentially determine whether the plurality of radar reflection points in the point cloud data are within the view frustum.
It should be noted that, based on the imaging principle of the image capturing device, since the light propagates along a straight line, when the object is imaged, an inverted image is formed on an image plane that is at a distance f (i.e., a focal length) from the central point (i.e., an optical center) of the image capturing device. Further, a virtual image plane for forward imaging can be obtained at a position symmetrical to the image plane with respect to the center of the camera. It can be understood that the pixel of the object corresponding to the virtual image plane can be determined according to the intersection point of the connection line of the object and the optical center and the phase plane. It should be understood that the parameters such as the optical center, the focal length, etc. may be determined according to preset parameters when the image acquisition device acquires the image.
Thus, the viewing cone may be as shown in fig. 8, and the viewing cone comprises 6 planes, i.e. a far plane and a near plane, an upper plane and a lower plane, and a left plane and a right plane. And taking the plane where the target image frame corresponding to the image characteristics is located as the virtual image plane, connecting the optical center and four corner points of the target image frame, and setting two sections with different distances from the optical center as the far plane and the near plane of the view cone, so as to obtain the view cone. Illustratively, a cross-section of 0 meters from the optical center may be provided as the near plane, and a cross-section of 200 meters from the optical center may be provided as the far plane.
In an example, the target detection device may determine the view frustum shown in fig. 8 according to a target image frame corresponding to the image feature, and then determine the position of the radar reflection point according to each radar reflection point data, thereby determining whether the radar reflection point is located in the view frustum.
In another example, the target detection apparatus does not generate the view frustum shown in fig. 8, and may project the radar reflection point onto the virtual image plane where the target image frame is located based on each radar reflection point data, and determine that the radar reflection point is within the view frustum if the projection point corresponding to the radar reflection point is within the target image frame. Otherwise, the radar reflection point is determined not to be in the view frustum.
S12, the target detection device determines the depth information of the image characteristics and the depth information of each radar reflection point in the point cloud data.
Based on the description of the point cloud data in step S101, it can be known that one radar reflection point data may include depth information of the radar reflection point.
Alternatively, as shown in fig. 7 (a), the target detection apparatus may process the image feature through the second processing module 72 to generate depth information corresponding to the image feature.
The depth information corresponding to each image feature may include a 2-dimensional vector, which is a mean value and a standard deviation of depths of a plurality of pixel points corresponding to the image feature in a three-dimensional space.
Alternatively, the structure of the second processing module may be the same as that of the first processing module described above.
And S13, determining the association probability of the image characteristics and each radar reflection point by the target detection device according to the acquired relative position relationship, the depth information of the image characteristics and the depth information of each radar reflection point in the point cloud data.
For example, the object detection device may determine the association probability p according to the following formula (1).
Figure BDA0003718607190000121
Wherein d is depth information of radar reflection point, mu d Is the depth mean, σ, in the depth information of the image feature d The depth standard deviation in the depth information of the image features.
And S14, the target detection device determines the radar reflection points with the association probability larger than or equal to the preset probability as the radar reflection points associated with the image characteristics.
It should be noted that, for an image feature, all radar reflection points with an association probability greater than or equal to a preset probability may be determined as radar reflection points associated with the image feature according to the association probability of the image feature and each radar reflection point. It should be understood that the number of radar reflection points associated with one image feature may be one, multiple, or 0. In addition, the value of the preset probability is not limited in the application.
Alternatively, the target detection apparatus may perform the above-described steps S13 to S14 by the association module 73 as shown in (a) of fig. 7, and output the association result, which is the radar reflection point associated with the image feature.
S103, the target detection device determines radar features corresponding to each image feature according to one or more radar reflection points associated with each image feature.
Optionally, the target detection apparatus may perform feature extraction on a set of one or more radar reflection points associated with one image feature based on a preset PointNet network model. The input to the PointNet network model may be an Nx 7-dimensional matrix, where N is the number of radar reflection points associated with the image feature, and each radar reflection point includes a 7-dimensional vector (x, y, z, v) x ,v y ,v z P), (x, y, z) the position coordinates of the radar reflection point, v x ,v y And v z Representing radar speed and p representing the probability of association.
The PointNet network model is a classical network model in point cloud research, is a widely used research method in the field at present, and has excellent performance on the problems of classification, semantic segmentation (point-by-point classification) and the like.
It should be understood that when an image feature does not reflect a point to an associated radar, the image feature corresponding radar feature vector may be 0. In addition, after point cloud feature extraction of the PointNet network model, each image feature has a 1024-dimensional radar feature vector corresponding to the image feature. Finally, all radar feature vectors are processed according to the image feature f img The order of the radar feature acquisition unit can obtain the corresponding radar feature
Figure BDA0003718607190000122
And S104, detecting the space state of a target object in the region to be detected by the target detection device according to each image characteristic and the radar characteristic corresponding to each image characteristic, wherein the target object comprises a human body and/or an object.
The spatial state of the target object may include information such as a spatial position coordinate, a rotation angle, a spatial movement speed, and the like of the target object.
The rotation angle may include a pitch angle, a yaw angle, and a roll angle, among others. The pitch angle refers to an included angle between the motion direction of the target object and a horizontal plane, the yaw angle refers to an included angle between a projection direction of the motion direction of the target object on the horizontal plane and a predetermined direction on the horizontal plane, the predetermined direction can be set as a road direction, and the roll angle is used for representing a transverse inclination angle.
It should be understood that the pitch angle, yaw angle, and roll angle described above are three rotational angles used in a navigation system to identify a motion of an object.
Optionally, as shown in fig. 9, the step S104 may be implemented as the following steps:
s1041, for each image feature, the target detection device determines the image feature and a joint feature of the radar feature corresponding to the image feature.
Optionally, the target detection device performs a stitching operation on the image feature and the radar feature corresponding to the image feature, so as to obtain the joint feature.
Specifically, the target detection device can splice image features extracted from the feature extraction model and radar features extracted from the PointNet network model along the channel dimension to obtain the combined information after fusion processing.
S1042, the target detection device determines the space state of the target object in the region to be detected according to the image characteristics and the joint characteristics.
It should be noted that, when determining the spatial state of the target object in the region to be measured, prediction information such as a target type, an orientation angle, a size, a center shift amount, speed information, direction information, and depth information of the target object may need to be predicted. The visual image generally has dense and rich semantic information, so that the visual image has great advantages in the aspects of object classification, size measurement and the like, and the radar point cloud data generally lacks rich semantic information but can be applied to the aspects of space positioning, speed measurement and the like. Therefore, the target detection apparatus can perform group prediction of prediction information of the target object to be predicted, for example, image features extracted based on the visual image can be used to predict information such as a target class, an orientation angle, a size, a center shift amount, and the like of the target object. And speed information and direction information, etc. on the prediction target object may be based on the joint features of the image features and radar data. Therefore, the advantages of the collected data of the image collecting equipment and the collected data of the radar can be fully exerted, and the accuracy of prediction can be improved.
Alternatively, the target detection apparatus may perform the following steps S21 to S24 to determine the spatial state of the target object within the region to be detected, based on the image feature and the joint feature.
And S21, the target detection device determines one or more items of target category, orientation angle, size and central offset of the target object in the region to be detected according to the image characteristics.
The target category may include, among others, vehicles, people, trees, and buildings.
The orientation angle refers to an angle formed by rotating to a target direction line of the target object with a true north or south direction as a start direction with a position of the target object as a center, and the target direction line may point to a moving direction of the target object.
As shown in fig. 10, the target detection apparatus may perform information prediction on the acquired image features by using the first prediction module 101 to obtain one or more of a target type, an orientation angle, a size, and a center offset of the target object. The first prediction module 101 may include four first sub-modules with the same structure, and respectively output the target category, the orientation angle, the size, or the center offset.
As shown in (a) of fig. 11, one first sub-module of the first prediction module 101 may have a structure of a combination of a 3 × 3 convolution kernel, a linear rectification unit, and a 1 × 1 convolution kernel.
And S22, the target detection device determines the speed information and the direction information of the target object in the region to be detected according to the joint characteristics.
As shown in fig. 10, the target detection apparatus may perform information prediction on the obtained joint feature through the second prediction module 102, so as to obtain speed information and direction information of the target object. As shown in fig. 11 (b), the second prediction module 102 may adopt a structure of a combination of a 1 × 1 convolution kernel, a linear rectification unit, and a 1 × 1 convolution kernel.
And S23, the target detection device determines the depth information of the target object in the region to be detected according to the image characteristics and the joint characteristics.
Wherein the depth information of the target object comprises a distance between the target object and the target detection device.
As shown in fig. 10, the target detection apparatus may perform information prediction on the acquired joint feature through the third prediction module 103 to obtain speed information and direction information of the target object. As shown in fig. 11 (c), the third prediction module 103 may adopt a structure of a combination of the first prediction module 101 and the second prediction module 102. The third prediction module 103 may input the image features into one or more first sub-modules based on the structure of the first prediction module 101, and input the joint features into a second sub-module based on the structure of the second prediction module 101, so that the third prediction module 103 may further combine and process the data output by each first sub-module and the data output by the second sub-module based on a custom weight preset by various data to obtain the speed information and the direction information of the target object. Further, the combining process may include a calculation manner of addition-by-addition, multiplication-by-multiplication, or a combination of addition-by-addition and multiplication-by-multiplication.
And S24, the target detection device determines the space state of the target object according to one or more items of target category, orientation angle, size, center offset, speed information, direction information and depth information of the target object in the region to be detected.
As shown in fig. 10, the target detection apparatus may determine the spatial state of the target object by analyzing one or more of the target category, the orientation angle, the size, the center offset, the speed information, the direction information, and the depth information by the three-dimensional target detection module 104.
Optionally, the target detection device may detect according to each image feature and the radar feature corresponding to each image feature, and project the target object to the three-dimensional space expression of the region to be detected according to the detected spatial state, so as to display the position and the motion state of the target object, that is, the three-dimensional spatial state of the target object.
Based on the embodiment, the method performs feature association on the visual image and the point cloud data, and then performs feature extraction on the point cloud data according to the association result to generate the radar features corresponding to the image features. And finally, fusing the two features to detect the space state of the target object in the region to be detected. Therefore, on one hand, in the processing process of the visual image and the point cloud data, the three-dimensional space information embodied by the radar point cloud data is not lost, so that the accuracy of cross-modal sensor data association between the visual image and the point cloud data can be improved. On the other hand, the target object in the region to be detected can be detected, and the three-dimensional space state of the target object can also be estimated.
The scheme provided by the embodiment of the application is mainly introduced from the perspective of a method. To implement the above functions, it includes hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Fig. 12 is a schematic structural diagram of an object detection apparatus 300 according to an embodiment of the present application. The object detection apparatus 300 includes an acquisition module 301, a feature association module 302, a feature extraction module 303, and an object detection module 304.
The acquiring module 301 is configured to acquire a visual image and point cloud data of a region to be detected.
A feature association module 302, configured to associate one or more image features extracted based on the visual image with the point cloud data, to obtain one or more radar reflection points associated with each image feature.
A feature extraction module 303, configured to determine, according to one or more radar reflection points associated with each image feature, a radar feature corresponding to each image feature.
And the target detection module 304 is configured to detect a spatial state of a target object in the region to be detected according to each image feature and the radar feature corresponding to each image feature, where the target object includes a human body and/or an object.
In a possible implementation manner, the feature association module 302 is specifically configured to: for a first image feature of the one or more image features, determining an association probability of the first image feature with each radar reflection point according to the first image feature and the point cloud data; and determining the radar reflection points with the association probability greater than or equal to the preset probability as the radar reflection points associated with the first image characteristics.
In another possible implementation manner, the feature association module 302 is specifically configured to: determining the relative position relation between the first image characteristic and each radar reflection point in the point cloud data; determining depth information of the first image feature and depth information of each radar reflection point in the point cloud data; and determining the association probability of the first image characteristic and each radar reflection point according to the relative position relation, the depth information of the first image characteristic and the depth information of each radar reflection point in the point cloud data.
In yet another possible implementation, the association probability satisfies the following relationship:
Figure BDA0003718607190000151
where d is depth information of radar reflection point, mu d Is the depth mean, σ, in the depth information of the first image feature d The view frustum is used for indicating a space region determined based on the positions of the target image frame corresponding to the first image feature and the image acquisition device.
In another possible implementation manner, the object detection module 304 is specifically configured to: and for a first image feature in the one or more image features, determining a joint feature of the first image feature and a radar feature corresponding to the first image feature, and determining the space state of the target object in the region to be detected according to the first image feature and the joint feature.
In another possible implementation manner, the object detection module 304 is specifically configured to: determining one or more of a target category, an orientation angle, a size and a center offset of a target object in a region to be detected according to the first image characteristics; determining speed information and direction information of a target object in the region to be detected according to the joint characteristics; determining the depth information of a target object in the region to be detected according to the first image characteristic and the joint characteristic; and determining the space state of the target object according to one or more items of the target class, the orientation angle, the size, the central offset, the speed information, the direction information and the depth information of the target object in the region to be measured.
In another possible implementation manner, the object detection module 304 is specifically configured to: inputting the first image characteristics into a first prediction module to obtain one or more of target category, orientation angle, size and central offset of a target object in a region to be measured; the first prediction module comprises one or more first sub-modules, one first sub-module comprises a 3x3 convolution kernel, a linear rectification unit and a 1x1 convolution kernel, and the one first sub-module is used for outputting the target class, the orientation angle, the size or the central offset of a target object in the region to be detected.
In another possible implementation manner, the object detection module 304 is further specifically configured to: inputting the joint characteristics into a second prediction module to obtain speed information and direction information of a target object in the region to be detected; the second prediction module comprises a 1 × 1 convolution kernel, a linear rectification unit, a 1 × 1 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel.
In another possible implementation manner, the object detection module 304 is further specifically configured to: inputting the first image characteristic and the joint characteristic into a third prediction module to obtain depth information of a target object in the region to be detected; the third prediction module comprises one or more first sub-modules and second sub-modules, the first sub-modules comprise 3x3 convolution kernels, linear rectification units and 1x1 convolution kernels, and the second sub-modules comprise 1x1 convolution kernels, linear rectification units, 1x1 convolution kernels, linear rectification units and 1x1 convolution kernels. State.
For the detailed description of the above alternative modes, reference may be made to the foregoing method embodiments, which are not described herein again. In addition, for any explanation and beneficial effect description of the target detection apparatus 300 provided above, reference may be made to the corresponding method embodiments described above, and details are not repeated.
As an example, in conjunction with fig. 3, the functions implemented by the acquisition module 301 of the object detection apparatus 300 may be implemented by the communication line 220 in fig. 3, but is not limited thereto.
Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It should be noted that the division of the modules in fig. 12 is schematic, and is only one division of the logic functions, and there may be another division in actual implementation. For example, two or more functions may also be integrated in one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The embodiment of the present application further provides a computer-readable storage medium, which includes computer-executable instructions, and when the computer-readable storage medium is run on a computer, the computer is caused to execute any one of the methods provided by the above embodiments. For example, one or more features of S101-S104 of FIG. 4 may be undertaken by one or more computer-executable instructions stored in the computer-readable storage medium.
Embodiments of the present application further provide a computer program product containing instructions for executing a computer, which when executed on a computer, causes the computer to perform any one of the methods provided in the foregoing embodiments.
An embodiment of the present application further provides a chip, including: a processor coupled to the memory through the interface, and an interface, when the processor executes the computer program or the computer execution instructions in the memory, the processor causes any one of the methods provided by the above embodiments to be performed.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions according to the embodiments of the present application are generated in whole or in part when the computer-executable instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer-readable storage media can be any available media that can be accessed by a computer or can comprise one or more data storage devices, such as servers, data centers, and the like, that can be integrated with the media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1. A method of object detection, the method comprising:
acquiring a visual image and point cloud data of a region to be detected, wherein the point cloud data comprises one or more radar reflection points;
associating one or more image features extracted based on the visual image with the point cloud data to obtain one or more radar reflection points associated with each image feature;
determining a radar feature corresponding to each image feature according to the one or more radar reflection points associated with each image feature;
and detecting the space state of a target object in the region to be detected according to each image characteristic and the radar characteristic corresponding to each image characteristic, wherein the target object comprises a human body and/or an object.
2. The method of claim 1, wherein associating one or more image features extracted based on the visual image with the point cloud data resulting in one or more radar-reflecting points associated with each image feature comprises:
for a first image feature of the one or more image features, determining an association probability of the first image feature with each radar reflection point from the first image feature and the point cloud data;
and determining the radar reflection points with the association probability greater than or equal to a preset probability as the radar reflection points associated with the first image feature.
3. The method of claim 2, wherein determining the probability of association of the first image feature with each radar reflection point from the first image feature and the point cloud data comprises:
determining a relative position relationship between the first image feature and each radar reflection point in the point cloud data;
determining depth information of the first image feature and depth information of each radar reflection point in the point cloud data;
and determining the association probability of the first image characteristic and each radar reflection point according to the relative position relation, the depth information of the first image characteristic and the depth information of each radar reflection point in the point cloud data.
4. The method according to claim 2 or 3, wherein the association probability satisfies the following relation:
Figure FDA0003718607180000011
wherein d is depth information of the radar reflection point, mu d Is a depth mean, σ, in the depth information of the first image feature d And the view frustum is used for indicating a space area determined based on the positions of the target image frame corresponding to the first image feature and the image acquisition equipment, wherein the space area is the depth standard deviation in the depth information of the first image feature.
5. The method according to any one of claims 1 to 3, wherein the detecting the spatial state of the target object in the region to be detected according to the each image feature and the radar feature corresponding to the each image feature comprises:
for a first image feature of the one or more image features, determining a joint feature of the first image feature and a radar feature corresponding to the first image feature;
and determining the space state of the target object in the region to be detected according to the first image feature and the joint feature.
6. The method of claim 5, wherein determining the spatial state of the target object within the region-under-test from the first image feature and the joint feature comprises:
determining one or more items of target categories, orientation angles, sizes and central offsets of target objects in the region to be detected according to the first image characteristics;
determining the speed information and the direction information of the target object in the region to be detected according to the joint characteristics;
determining the depth information of a target object in the region to be detected according to the first image feature and the joint feature;
and determining the space state of the target object according to one or more items of the target class, the orientation angle, the size, the central offset, the speed information, the direction information and the depth information of the target object in the region to be measured.
7. The method of claim 6, wherein determining one or more of a target class, an orientation angle, a size, and a center offset of a target object within the region of interest from the first image feature comprises:
inputting the first image characteristics into a first prediction module to obtain one or more of target category, orientation angle, size and central offset of a target object in the region to be detected;
the first prediction module comprises one or more first sub-modules, one first sub-module comprises a 3 × 3 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel, and the one first sub-module is used for outputting the target class, the orientation angle, the size or the central offset of the target object in the region to be detected.
8. The method of claim 6, wherein determining the speed information and the direction information of the target object in the region to be measured according to the joint feature comprises:
inputting the joint characteristics into a second prediction module to obtain speed information and direction information of the target object in the region to be detected;
wherein the second prediction module comprises a 1 × 1 convolution kernel, a linear rectification unit, and a 1 × 1 convolution kernel.
9. The method of claim 6, wherein determining depth information of a target object within a region under test from the first image feature and the joint feature comprises:
inputting the first image feature and the joint feature into a third prediction module to obtain depth information of a target object in the region to be measured;
the third prediction module comprises one or more first sub-modules and second sub-modules, wherein the first sub-modules comprise a 3 × 3 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel, and the second sub-modules comprise a 1 × 1 convolution kernel, a linear rectification unit, a 1 × 1 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel.
10. An object detection apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a visual image and point cloud data of a region to be detected;
the characteristic association module is used for associating one or more image characteristics extracted based on the visual image with the point cloud data to obtain one or more radar reflection points associated with each image characteristic;
the feature extraction module is used for determining radar features corresponding to each image feature according to the one or more radar reflection points associated with each image feature;
and the target detection module is used for detecting the space state of a target object in the region to be detected according to each image characteristic and the radar characteristic corresponding to each image characteristic, wherein the target object comprises a human body and/or an object.
11. The apparatus of claim 10, wherein the feature association module is specifically configured to:
for a first image feature of the one or more image features, determining an association probability of the first image feature with each radar reflection point from the first image feature and the point cloud data;
determining the radar reflection points with the association probability larger than or equal to a preset probability as the radar reflection points associated with the first image characteristics;
the feature association module is specifically configured to:
determining a relative position relationship between the first image feature and each radar reflection point in the point cloud data;
determining depth information of the first image feature and depth information of each radar reflection point in the point cloud data;
determining the association probability of the first image characteristic and each radar reflection point according to the relative position relation, the depth information of the first image characteristic and the depth information of each radar reflection point in the point cloud data;
the association probability satisfies the following relationship:
Figure FDA0003718607180000031
wherein d is depth information of the radar reflection point, mu d Is a depth mean, σ, in the depth information of the first image feature d The view frustum is used for indicating a space area determined by the positions of the target image frame corresponding to the first image characteristic and the image acquisition equipment; the target detection module is specifically configured to:
for a first image feature of the one or more image features, determining a joint feature of the first image feature and a radar feature corresponding to the first image feature;
determining the space state of a target object in the region to be detected according to the first image feature and the joint feature;
the target detection module is specifically configured to:
determining one or more items of target categories, orientation angles, sizes and central offsets of target objects in the region to be detected according to the first image characteristics;
determining the speed information and the direction information of the target object in the region to be detected according to the joint characteristics;
determining the depth information of the target object in the region to be detected according to the first image feature and the joint feature;
determining the space state of the target object according to one or more items of the target class, the orientation angle, the size, the central offset, the speed information, the direction information and the depth information of the target object in the region to be measured
The target detection module is specifically configured to:
inputting the first image characteristics into a first prediction module to obtain one or more of target category, orientation angle, size and central offset of a target object in the region to be detected;
the first prediction module comprises one or more first sub-modules, one first sub-module comprises a 3x3 convolution kernel, a linear rectification unit and a 1x1 convolution kernel, and the one first sub-module is used for outputting a target class, an orientation angle, a size or a central offset of a target object in the region to be detected;
the target detection module is further specifically configured to:
inputting the joint characteristics into a second prediction module to obtain speed information and direction information of a target object in the region to be measured;
the second prediction module comprises a 1 × 1 convolution kernel, a linear rectification unit, a 1 × 1 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel;
the target detection module is further specifically configured to:
inputting the first image characteristic and the joint characteristic into a third prediction module to obtain depth information of a target object in the region to be detected;
the third prediction module comprises one or more first sub-modules and second sub-modules, wherein the first sub-modules comprise a 3 × 3 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel, and the second sub-modules comprise a 1 × 1 convolution kernel, a linear rectification unit, a 1 × 1 convolution kernel, a linear rectification unit and a 1 × 1 convolution kernel.
12. An electronic device, comprising a memory and a processor; the memory and the processor are coupled; the memory for storing computer program code, the computer program code comprising computer instructions;
wherein the computer instructions, when executed by the processor, cause the electronic device to perform the object detection method of any of claims 1-9.
13. A computer-readable storage medium storing computer instructions that, when executed on an electronic device, cause the electronic device to perform the method of inter-target detection of any of claims 1-9.
CN202210742731.8A 2022-06-28 2022-06-28 Target detection method and device Pending CN115131756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210742731.8A CN115131756A (en) 2022-06-28 2022-06-28 Target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210742731.8A CN115131756A (en) 2022-06-28 2022-06-28 Target detection method and device

Publications (1)

Publication Number Publication Date
CN115131756A true CN115131756A (en) 2022-09-30

Family

ID=83379264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210742731.8A Pending CN115131756A (en) 2022-06-28 2022-06-28 Target detection method and device

Country Status (1)

Country Link
CN (1) CN115131756A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115542308A (en) * 2022-12-05 2022-12-30 德心智能科技(常州)有限公司 Indoor personnel detection method, device, equipment and medium based on millimeter wave radar

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115542308A (en) * 2022-12-05 2022-12-30 德心智能科技(常州)有限公司 Indoor personnel detection method, device, equipment and medium based on millimeter wave radar
CN115542308B (en) * 2022-12-05 2023-03-31 德心智能科技(常州)有限公司 Indoor personnel detection method, device, equipment and medium based on millimeter wave radar

Similar Documents

Publication Publication Date Title
US20210279444A1 (en) Systems and methods for depth map sampling
Henry et al. RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments
US20190033447A1 (en) Systems and methods for detecting objects in underwater environments
Carrio et al. Drone detection using depth maps
CN106233219B (en) Mobile platform operating system and method
KR20220119396A (en) Estimation of object size using camera map and/or radar information
WO2020243962A1 (en) Object detection method, electronic device and mobile platform
JP2018163096A (en) Information processing method and information processing device
US11361457B2 (en) Annotation cross-labeling for autonomous control systems
CN110663060B (en) Method, device, system and vehicle/robot for representing environmental elements
CN116027324B (en) Fall detection method and device based on millimeter wave radar and millimeter wave radar equipment
Cui et al. 3D detection and tracking for on-road vehicles with a monovision camera and dual low-cost 4D mmWave radars
US20220049961A1 (en) Method and system for radar-based odometry
Clunie et al. Development of a perception system for an autonomous surface vehicle using monocular camera, lidar, and marine radar
WO2022179207A1 (en) Window occlusion detection method and apparatus
CN115131756A (en) Target detection method and device
Ozcan et al. A novel fusion method with thermal and RGB-D sensor data for human detection
CN117808689A (en) Depth complement method based on fusion of millimeter wave radar and camera
US11561553B1 (en) System and method of providing a multi-modal localization for an object
WO2022083529A1 (en) Data processing method and apparatus
CN112651405B (en) Target detection method and device
US11280899B2 (en) Target recognition from SAR data using range profiles and a long short-term memory (LSTM) network
Šuľaj et al. Examples of real-time UAV data processing with cloud computing
CN113822372A (en) Unmanned aerial vehicle detection method based on YOLOv5 neural network
TW202206849A (en) Apparatus for measurement and method of determining distance between two points in environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination