WO2022012158A1 - Target determination method and target determination device - Google Patents

Target determination method and target determination device Download PDF

Info

Publication number
WO2022012158A1
WO2022012158A1 PCT/CN2021/094781 CN2021094781W WO2022012158A1 WO 2022012158 A1 WO2022012158 A1 WO 2022012158A1 CN 2021094781 W CN2021094781 W CN 2021094781W WO 2022012158 A1 WO2022012158 A1 WO 2022012158A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
wave detection
millimeter
image
information
Prior art date
Application number
PCT/CN2021/094781
Other languages
French (fr)
Chinese (zh)
Inventor
原崧育
杨臻
张维
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022012158A1 publication Critical patent/WO2022012158A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • G01S13/867Combination of radar systems with cameras
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • G01S13/89Radar or analogous systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • G01S13/91Radar or analogous systems specially adapted for specific applications for traffic control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a target determination method and a target determination device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • Object detection and recognition refers to finding objects from a scene (eg, an image), which can include two processes of detection and recognition.
  • the detection specifically refers to judging whether there is a target, and if there is a target, determining the position of the target. Identifying specifically refers to identifying categories of targets.
  • Object detection and recognition have a wide range of applications in many fields of life, such as automatic driving, driving assistance and early warning. In the process of target detection and recognition, multi-sensor fusion is usually required, for example, the data collected by lidar, millimeter-wave radar, vision sensor, infrared sensor, etc. Detection and identification of objects in the environment.
  • the embodiments of the present application provide a method for determining a target, so that the accuracy of the correlation between the detection results of the vision sensor and the millimeter wave radar can be improved.
  • a first aspect of the present application provides a target determination method, and the method provided by the present application can be adapted to the field of automatic driving or the field of monitoring. It may include: acquiring a to-be-processed image and multiple millimeter-wave detection points, where the to-be-processed image and the multiple millimeter-wave detection points are data obtained synchronously for the same detection target.
  • the working principle of millimeter-wave radar is to use high-frequency circuits to generate electromagnetic waves with specific modulation frequencies, and to send electromagnetic waves and receive electromagnetic waves from the target through the antenna, and calculate the parameters of the target through the parameters of the transmitted and received electromagnetic waves. Millimeter-wave radar can measure distance, speed and azimuth of multiple targets at the same time.
  • a millimeter-wave detection point includes various parameters of the target. Specifically, the millimeter-wave detection point in this solution includes depth information, that is, parameters obtained through ranging. Of course, the millimeter wave detection point may also include other parameters. For example, a millimeter wave detection point may include the depth information of the target, the speed information of the target (parameters obtained through speed measurement), and the azimuth information of the target (parameters obtained through azimuth measurement).
  • the data acquired synchronously can be understood as the millimeter-wave radar and the image sensor simultaneously collect data, or it can be understood as the deviation of the frame rate of the data collected by the millimeter-wave radar and the image sensor within a preset range.
  • the millimeter-wave radar collects the millimeter-wave detection points according to the first frame rate
  • the image sensor collects the image to be processed according to the second frame rate
  • the deviation between the first frame rate and the second frame rate is less than the preset threshold, namely It can be considered that the millimeter-wave radar and the image to be processed are data acquired synchronously.
  • the image to be processed can be obtained through the vision sensor, and multiple millimeter-wave detection points can be obtained through the millimeter-wave radar.
  • the image to be processed may be an image obtained by the vehicle through a visual sensor, and specifically, the image to be processed may be an image captured by the vehicle through a camera installed on the vehicle.
  • the image to be processed may be an image acquired by a visual sensor installed on the roadside, and specifically, the image to be processed may be an image captured by a camera installed on the roadside.
  • Each millimeter-wave detection point may include depth information, where the depth information is used to represent the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to acquire multiple millimeter-wave detection points.
  • the detection target can be any target such as vehicles, people, trees, etc. Map multiple mmWave detection points onto the image to be processed.
  • a plurality of candidate frames of the detection target on the to-be-processed image are determined according to the first information.
  • the first information may include depth information and position information of each millimeter-wave detection point.
  • the position information is used to indicate that each millimeter-wave detection point is mapped on the to-be-processed image. Process the position on the image.
  • a set of candidate frames can be determined according to the depth information and position information of each millimeter wave detection point, and the set of candidate frames includes multiple candidate frames.
  • the target frame is based on the depth information and position information of the target millimeter wave detection point. definite. It can be seen from the first aspect that by mapping the millimeter wave detection points to the image to be processed, multiple candidate frames of the to-be-processed image are determined according to the position information and depth information of the millimeter wave detection points, and the multiple candidate frames are processed according to the depth information.
  • NMS processing when the final candidate frame is determined, the millimeter wave detection points associated with the candidate frame can be output to improve the accuracy of target matching.
  • performing non-maximum value suppression NMS processing on multiple candidate frames according to depth information may include: pairing multiple candidate frames according to the first score and the second score.
  • Each candidate frame is subjected to non-maximum value suppression NMS processing, and the first score represents the probability that the detection target in each candidate frame belongs to each of the N categories determined by the classifier, and the N categories are preset Category, N is a positive integer, and the second score indicates the probability that the detection target in each candidate frame belongs to each of the N categories, determined according to the depth information and the first probability distribution between each category.
  • the first possible implementation method of the first aspect is to perform NMS processing on multiple candidate boxes through the first score and the second score to improve the accuracy of data association, that is, to improve the accuracy of the one-to-one matching of the same target in different sensors. .
  • the method may further include: performing statistics on the data in the first set, and determining the statistics corresponding to each category
  • the probability distribution of the first size of the target, the first set may include a plurality of statistical targets corresponding to each category, and size information of each statistical target.
  • the first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target. It can be seen from the second possible implementation manner of the first aspect that a specific manner of how to determine the first probability distribution is given, which increases the diversity of the scheme.
  • the method may further include: performing statistics on the data in the second set, and determining the statistics corresponding to each category
  • the probability distribution of the second size of the objects, the second set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object.
  • the second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between.
  • the data can be updated. For example, in an autonomous driving scenario, the depth information and the probability distribution between each category can be determined through the updated data.
  • the size information is height information of the statistical target. It can be known from the fourth possible implementation manner of the first aspect that a specific category of size information is given, which increases the diversity of solutions.
  • the location information is used in combination with the millimeter wave detection point on the vehicle.
  • the distribution characteristics of determine the position of the candidate frame in the image to be processed. It can be seen from the fifth possible implementation manner of the first aspect that when determining the position of the candidate frame in the image to be processed, considering the distribution characteristics of the millimeter wave detection points on the vehicle, the position of the millimeter wave detection points can be better determined. Determine the position of the detection target in the image to be processed.
  • the millimeter-wave detection point is generally located in the lower left corner of the target vehicle, then the millimeter-wave detection point can be located in the lower left corner of the prior frame.
  • the corners determine multiple a priori boxes. If the distribution characteristics of the millimeter-wave detection points are not considered, the position of the a priori frame is arbitrarily determined according to the millimeter-wave detection points. The probability of detecting the target will decrease.
  • the depth information is used to determine the size of the candidate frame, and the candidate frame The size of is negatively correlated with depth information.
  • the method may further include: using an efficient regional convolutional neural network (faster regions with convolution neural network, Faster-RCNN) process the image to be processed to obtain the first feature map of the image to be processed.
  • the second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map.
  • the second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
  • the image to be processed is acquired by the visual sensor, and the sampling of the visual sensor is used.
  • the frequency is the first frequency
  • the sampling frequency of the millimeter wave radar is the second frequency
  • the difference between the first frequency and the second frequency is not greater than a preset threshold.
  • a second aspect of the present application provides a device for determining a target, which may include: an acquisition module configured to acquire an image to be processed and multiple millimeter-wave detection points, where the image to be processed and the multiple millimeter-wave detection points are data obtained synchronously for the same target , each millimeter-wave detection point can include depth information, the depth information is used to indicate the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to obtain multiple millimeter-wave detection points.
  • the mapping module is used to map the multiple millimeter wave detection points acquired by the acquisition module to the to-be-processed image acquired by the acquisition module.
  • the processing module is configured to determine a plurality of candidate frames of the detection target on the image to be processed according to the first information, the first information may include depth information and position information of each millimeter wave detection point, and the position information is used to represent each millimeter wave The location of the detection point mapped on the image to be processed.
  • the processing module is also used to perform non-maximum suppression NMS processing on multiple candidate frames according to the depth information, so as to output the target frame and the target millimeter wave detection point.
  • the target frame is determined according to the depth information and position information of the target millimeter wave detection point. of.
  • the processing module is specifically configured to: perform non-maximum suppression NMS processing on multiple candidate frames according to the first score and the second score, and the first One score represents the probability that the detection target in each candidate frame belongs to each of the N categories determined by the classifier, where the N categories are preset categories, N is a positive integer, and the second score represents the depth The probability that the detection target in each candidate frame belongs to each of the N categories, determined by the first probability distribution between the information and each category.
  • the target determination device may further include a statistics module, a statistics module for performing statistics on the data in the first set. , determine the probability distribution of the first size of the statistical objects corresponding to each category, and the first set may include multiple statistical objects corresponding to each category, and size information of each statistical object.
  • the first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
  • the statistics module is further configured to: perform statistics on the data in the second set, and determine the corresponding data of each category.
  • the probability distribution of the second size of the statistical objects, the second set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object.
  • the second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between.
  • the size information is height information of the statistical target.
  • the location information is used in combination with the millimeter wave detection point on the vehicle.
  • the distribution characteristics of determine the position of the candidate frame in the image to be processed.
  • the depth information is used to determine the size of the candidate frame, and the candidate frame The size of is negatively correlated with depth information.
  • the processing module is further configured to:
  • the processed image is processed to obtain a first feature map of the to-be-processed image.
  • the second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map.
  • the second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
  • the image to be processed is acquired by the visual sensor, and the sampling of the visual sensor is used.
  • the frequency is the first frequency
  • the sampling frequency of the millimeter wave radar is the second frequency
  • the difference between the first frequency and the second frequency is not greater than a preset threshold.
  • a third aspect of the present application provides a smart car.
  • the smart car may include a processor, the processor is coupled with a memory, and the memory stores program instructions.
  • the program instructions stored in the memory are executed by the processor, the first aspect or any one of the first aspect methods described in possible implementations.
  • a fourth aspect of the present application provides a monitoring device.
  • the monitoring device has a processor, the processor is coupled to a memory, and the memory stores program instructions.
  • the program instructions stored in the memory are executed by the processor, the first aspect or any one of the first aspects may be possible.
  • a fifth aspect of the present application provides a computer-readable storage medium, which may include a program that, when executed on a computer, causes the computer to execute the method described in the first aspect or any possible implementation manner of the first aspect.
  • a sixth aspect of the present application provides a target determination system.
  • the target determination system may include an end-side device and a cloud-side device, and the end-side device is used to acquire a to-be-processed image and multiple millimeter wave detection points, the to-be-processed image and multiple millimeter wave detection points.
  • the wave detection point is the data obtained synchronously for the same detection target.
  • Each millimeter wave detection point can include depth information.
  • the depth information is used to indicate the distance between the detection target and the millimeter wave radar.
  • the millimeter wave radar is used to obtain multiple millimeter wave detection points.
  • the cloud-side device is used to receive the to-be-processed image and multiple millimeter-wave detection points sent by the end-side device.
  • the cloud-side device is also used to map multiple millimeter wave detection points to the image to be processed.
  • the cloud-side device is further configured to determine multiple candidate frames of the detection target on the to-be-processed image according to the first information, where the first information may include depth information and position information of each millimeter wave detection point, and the position information is used to represent each The location of the mmWave detection point mapped on the image to be processed.
  • the cloud-side device is also used to perform non-maximum suppression NMS processing on multiple candidate frames according to the depth information to output the target frame and the target millimeter wave detection point.
  • the target frame is based on the depth information and position information of the target millimeter wave detection point. definite.
  • the cloud-side device is specifically configured to perform non-maximum suppression NMS processing on multiple candidate frames according to the first score and the second score, and the first One score represents the probability that the detection target in each candidate frame belongs to each of the N categories determined by the classifier, where the N categories are preset categories, N is a positive integer, and the second score represents the depth The probability that the detection target in each candidate frame belongs to each of the N categories, determined by the first probability distribution between the information and each category.
  • the cloud-side device is further configured to perform statistics on the data in the first set, and determine the corresponding data of each category.
  • the probability distribution of the first size of the statistical objects, the first set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object.
  • the first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
  • the cloud-side device is further configured to perform statistics on the data in the second set, and determine the corresponding data of each category.
  • the probability distribution of the second size of the statistical objects, the second set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object.
  • the second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between.
  • the size information is height information of the statistical target.
  • the location information is used in combination with the millimeter wave detection point on the vehicle.
  • the distribution characteristics of determine the position of the candidate frame in the image to be processed.
  • the depth information is used to determine the size of the candidate frame, and the candidate frame The size of is negatively correlated with depth information.
  • the cloud-side device is also used for processing by Faster-RCNN.
  • the processed image is processed to obtain a first feature map of the to-be-processed image.
  • the second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map.
  • the second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
  • the end-side device acquires the image to be processed through a visual sensor, and the visual
  • the sampling frequency of the sensor is the first frequency
  • the sampling frequency of the millimeter wave radar is the second frequency
  • the difference between the first frequency and the second frequency is not greater than a preset threshold.
  • a seventh aspect of the present application provides a model training method, which may include: acquiring a training image and multiple millimeter-wave detection points, data obtained from the training image and multiple millimeter-wave detection points synchronously for the same detection target, each millimeter-wave detection point Depth information can be included, and the depth information is used to indicate the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to obtain multiple millimeter-wave detection points. Map multiple mmWave probe points onto the training image.
  • a plurality of candidate frames of the detection target on the training image are determined according to the first information.
  • the first information may include depth information and position information of each millimeter wave detection point, and the position information is used to indicate that each millimeter wave detection point is mapped on the training image. on the location.
  • the model is trained according to the feature maps corresponding to multiple candidate boxes.
  • the position information is used to determine the position of the candidate frame in the training image in combination with the distribution characteristics of the millimeter wave detection points on the vehicle.
  • the depth information is used to determine the size of the candidate frame, and the size of the candidate frame is negatively correlated with the depth information.
  • convolution processing may also be performed on the training image to obtain: The first feature map of the training image.
  • the second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map, and the model is trained according to the second feature maps.
  • the training image is acquired by the visual sensor, and the sampling frequency of the visual sensor is is the first frequency, the sampling frequency of the millimeter wave radar is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold.
  • Figure 1a is a schematic flowchart of the fusion of detection results at the target level
  • Figure 1b is a schematic flowchart of feature-level fusion
  • Figure 2 is a schematic diagram of the detection performance of heterogeneous sensors in different dimensions
  • FIG. 3 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application.
  • 5 is a schematic diagram of an efficient regional convolutional neural network
  • FIG. 6 is a schematic flowchart of a target determination method provided by an embodiment of the present application.
  • FIG. 7a is a schematic diagram of an application scenario of a target determination method provided by the present application.
  • 7b is a schematic diagram of an application scenario of another target determination method provided by the present application.
  • 7c is a schematic diagram of an application scenario of another target determination method provided by the present application.
  • 7d is a schematic diagram of an application scenario of another target determination method provided by the present application.
  • FIG. 7e is a schematic diagram of an application scenario of another target determination method provided by the present application.
  • FIG. 8 is a schematic flowchart of another target determination method provided by an embodiment of the present application.
  • 9a is a schematic diagram of an application scenario of another target determination method provided by the present application.
  • 9b is a schematic diagram of an application scenario of another target determination method provided by the present application.
  • 10 is a schematic diagram of the probability distribution of the first size provided by the application.
  • FIG. 11 is a schematic flowchart of another target determination method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of an application scenario of a target determination method provided by the present application.
  • Fig. 13 is the effect comparison diagram of the scheme provided by the embodiment of the present application and other schemes;
  • FIG. 14 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • 15 is a schematic structural diagram of a target determination device provided by the application.
  • 16 is a schematic structural diagram of a model training device provided by the application.
  • 17 is a schematic structural diagram of another target determination device provided by the application.
  • FIG. 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Multi-sensor information fusion is to use computer technology to automatically analyze and synthesize information or data from multiple sensors or multiple sources under certain criteria to complete the required decision-making and estimation. information processing process.
  • the definition of sensor data fusion can be summarized as synthesizing the local data resources provided by multiple sensors of the same or different types distributed in different locations, and using computer technology to analyze them to eliminate the possible redundancy and redundancy between multi-sensor information. Contradictions, complement each other, reduce their uncertainty, and obtain a consistent interpretation and description of the measured target, thereby improving the rapidity and correctness of system decision-making, planning, and response, and enabling the system to obtain more adequate information.
  • the same type of sensor is sometimes referred to as a homogeneous sensor, and a different type of sensor is referred to as a heterogeneous sensor.
  • a different type of sensor is referred to as a heterogeneous sensor.
  • this application sometimes refers to multi-sensor information fusion as multi-sensor data fusion, or multi-sensor fusion, and when their differences are not emphasized, they mean the same thing.
  • Sensor information fusion can be used for information fusion at different levels, such as target-level detection result fusion (high-level fusion) and feature-level fusion (feature-level fusion).
  • high-level fusion refers to the fusion of target-level detection results of multiple homogeneous or heterogeneous sensors after obtaining target-level detection results from the data of a single sensor.
  • Feature-level fusion refers to the fusion of the extracted features of multiple homogeneous or heterogeneous sensors after the feature extraction of the measurement data of a single sensor to form the target-level detection result. 1a and FIG. 1b are described below.
  • FIG. 1a is a schematic flowchart of target-level detection result fusion
  • FIG. 1a is a schematic flowchart of target-level detection result fusion
  • FIG. 1b is a schematic flowchart of feature-level fusion.
  • the data acquired by the first sensor is processed by the first perception algorithm to output the first target-level detection result of the target.
  • the data acquired by the second sensor is processed by the second perception algorithm to output the second target-level detection result of the target.
  • the data acquired by the third sensor is processed by the third perception algorithm to output the third target-level detection result of the target.
  • the first target level detection result, the second target level detection result and the third target level detection result are then fused.
  • each sensor independently processes the generated object data.
  • Each sensor has its own independent perception.
  • lidar has the perception of lidar
  • camera has the perception of camera
  • millimeter-wave radar will also make its own perception.
  • the main processor After all sensors complete the target data generation, the main processor performs data fusion. As shown in Fig. 1b, it is assumed that there are multiple sensors, namely the first sensor, the second sensor and the third sensor. In the feature-level fusion scenario, there is only one perception algorithm that perceives the fused multi-dimensional comprehensive data. Since there is only one perception algorithm, the data acquired by each sensor needs to be synchronized in time and space.
  • the synchronization of time is to ensure that the data collected by different sensors are synchronized in time
  • the synchronization of space is to convert the measurement values of different sensors to the same coordinate system based on their respective coordinate systems, that is, the coordinate system Unite.
  • the common methods of multi-sensor data fusion can be basically summarized into two categories: random and artificial intelligence.
  • the random methods include weighted average method, Kalman filter method, multi-Bayesian estimation method, evidence inference, production rules, etc.;
  • the intelligent category includes fuzzy logic theory, neural network, rough set theory, expert system and so on.
  • FIG. 2 it is a schematic diagram of the detection performance of heterogeneous sensors in different dimensions.
  • Figure 2 shows 3 kinds of sensors, camera, millimeter wave radar, lidar, and the detection performance of these three kinds of sensors in 7 different dimensions, the 7 different dimensions are target detection, target recognition, distance measurement, object Edge detection, lane tracking, inclement weather and dark or heavily exposed functions.
  • both millimeter-wave radar and lidar have good detection performance in target detection.
  • the accuracy of data association between millimeter-wave radar and lidar will be higher.
  • the accuracy of data association between millimeter-wave radar and lidar will be low.
  • the detection performance of the camera and the millimeter-wave radar cannot be good in a certain dimension, so the accuracy of the data association between the camera and the millimeter-wave radar is usually low.
  • the complementary effect of the measurement characteristics of the camera and the millimeter-wave radar is also very good. Therefore, how to correlate the detection results of cameras and millimeter-wave radars is of great significance.
  • the solution provided by this application needs to correlate the output data of heterogeneous sensors through a neural network.
  • the following will involve a lot of knowledge related to neural networks. knowledge is introduced. It should be noted that the solution provided in this application does not limit the type of neural network, and any neural network that can be used for target detection can be used in the embodiments of this application.
  • CNN convolutional neural network
  • convolutional neural network is a deep neural network with a convolutional structure
  • it is a deep learning (deep learning) architecture. Learning at multiple levels at the level of abstraction.
  • a CNN is a feed-forward artificial neural network in which each neuron responds to overlapping regions in images fed into it.
  • a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
  • the convolutional/pooling layer 120 may include layers 121-126 as examples.
  • layer 121 is a convolutional layer
  • layer 122 is a pooling layer
  • layer 123 is a convolutional layer
  • layer 124 is a convolutional layer.
  • Layers are pooling layers
  • 125 are convolutional layers
  • 126 are pooling layers; in another implementation, 121 and 122 are convolutional layers, 123 are pooling layers, 124 and 125 are convolutional layers, and 126 are pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can be essentially a weight matrix. This weight matrix is usually pre-defined. In the process of convolving an image, the weight matrix is usually pixel by pixel along the horizontal direction on the input image ( Or two pixels after two pixels...depending on the value of stride), which completes the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Perform fuzzification...
  • the dimensions of the multiple weight matrices are the same, and the dimension of the feature maps extracted from the weight matrices with the same dimensions are also the same, and then the multiple extracted feature maps with the same dimensions are combined to form the output of the convolution operation .
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
  • the initial convolutional layer for example, 121
  • the features extracted by the later convolutional layers become more and more complex, such as features such as high-level semantics.
  • pooling layer after the convolutional layer, that is, each layer 121-126 exemplified by 120 in Figure 3, which can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the average value of the pixel values in the image within a certain range.
  • the max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image.
  • the size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to utilize the neural network layer 130 to generate one or a set of outputs of the required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 3) and the output layer 140, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 140 After the multi-layer hidden layers in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 100 (as shown in Fig. 3 from 110 to 140 is forward propagation) is completed, the back propagation (as shown in Fig. 3 from 140 to 110 as back propagation) will start to update The weight values and biases of the aforementioned layers are used to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.
  • the convolutional neural network 100 shown in FIG. 3 is only used as an example of a convolutional neural network.
  • the convolutional neural network can also exist in the form of other network models, for example, such as The multiple convolutional layers/pooling layers shown in FIG. 4 are in parallel, and the extracted features are input to the neural network layer 130 for processing.
  • the neural network of the present application may adopt an efficient regional convolutional neural network (faster regions with convolution neural network, Faster-RCNN).
  • the Faster RCNN target detection algorithm is a typical target detection algorithm.
  • a multi-layer convolution layer is used to extract the basic feature map of the image.
  • the Faster RCNN algorithm is used in the algorithm.
  • the region proposal network (RPN) generates a large number of candidate boxes, and filters and filters a large number of candidate boxes, and only selects a fixed number of candidate boxes and inputs them into the next-level module; Carry out a deeper classification analysis, and finally obtain the final candidate box containing the target.
  • Faster RCNN can include four parts, namely convolution layer, RPN network, partition pooling layer (roi pooling), classification layer and regression network. Each of them will be described below.
  • the convolutional layer has been introduced above. It is mainly used to extract the features of the picture. The input is the entire picture, and the output is the extracted features. The extracted features are generally called feature maps.
  • the RPN network is used to recommend candidate regions. The input is a picture, and the output is multiple candidate regions. It should be noted that the solution provided in this application does not output candidate regions through the RPN network, which will be described later.
  • a candidate region is sometimes referred to as a candidate frame, and when the difference between the two is not emphasized, the two have the same meaning.
  • the process of roi pooling can be understood as the process of pooling candidate regions.
  • the process of max pooling is also carried out, and then a second feature map is obtained, which is sent to continue the calculation later.
  • the second feature map is the feature map corresponding to the candidate region.
  • the classification layer and the regression network further process the second feature map, and output the class to which the candidate region belongs and the position of the candidate region in the image through the classification layer and the regression network.
  • the solution provided in this application may include two parts, the “inference” process and the “training” process. They are introduced separately below.
  • FIG. 6 is a schematic flowchart of a method for determining a target according to an embodiment of the present application.
  • a target determination method provided by an embodiment of the present application may include the following steps:
  • the solution provided in this application can be applied to various scenarios, specifically, the method shown in FIG. 6 can be applied to scenarios such as the field of automatic driving and the field of monitoring.
  • the image to be processed in step 601 may be an image obtained by the vehicle through a visual sensor, and specifically, the image to be processed may be captured by the vehicle through a camera installed on the vehicle Image.
  • the image to be processed in step 601 may be an image acquired by a visual sensor installed on the roadside, specifically, the image to be processed may be an image captured by a camera installed on the roadside.
  • the vision sensor may include a lens and an image sensor.
  • the optical image generated by the lens is projected onto the image sensor, and the image sensor converts it into an electrical signal, and then through the analog-to-digital (A/D) conversion and other processing processes, the image to be processed is obtained.
  • the visual sensor can be in any of the following specific forms, for example, a camera, a video camera, a camera, a scanner, or other devices with a camera function (for example, a mobile phone, a tablet computer, etc.).
  • the multiple millimeter-wave detection points and the images to be processed are data obtained synchronously.
  • Each millimeter-wave detection point includes depth information, and the depth information is used to indicate the distance between the detection target and the millimeter-wave radar.
  • the detection target can be any target such as vehicles, people, trees, etc.
  • the data acquired synchronously can be understood as the millimeter-wave radar and the image sensor simultaneously collect data, or it can be understood as the deviation of the frame rate of the data collected by the millimeter-wave radar and the image sensor within a preset range.
  • the millimeter-wave radar collects the millimeter-wave detection points according to the first frame rate
  • the image sensor collects the image to be processed according to the second frame rate
  • the deviation between the first frame rate and the second frame rate is less than the preset threshold, namely It can be considered that the millimeter-wave radar and the image to be processed are data acquired synchronously.
  • the millimeter-wave radar emits high-frequency millimeter waves, which are collected by the receiving system after being reflected by the target, and the distance to the target is determined by frequency measurement, thereby forming multiple millimeter-wave detection points.
  • the millimeter wave detection point in step 602 may be the data obtained by the millimeter wave radar installed on the vehicle.
  • the millimeter-wave detection point in step 602 may be data obtained by a millimeter-wave radar on a monitoring device installed on the road.
  • depth information is also sometimes referred to as distance information, and both represent the distance between the target acquired by the millimeter-wave radar and the millimeter-wave radar when the difference between the two is not emphasized.
  • the present application can implement the mapping of multiple millimeter wave detection points to the image to be processed in various ways.
  • a method of mapping multiple millimeter wave detection points to the image to be processed is given below.
  • the embodiments of the present application can all be used in the related art in which a plurality of millimeter wave detection points can be mapped onto the image to be processed.
  • the millimeter-wave detection points acquired by the millimeter-wave radar can be mapped through the unification of the coordinate system. to the image to be processed acquired by the vision sensor.
  • the millimeter-wave detection point determined by the millimeter-wave radar and the target determined by the vision sensor must be in the same coordinate system for better correlation and matching.
  • the visual sensor coordinate system is (Xc, Yc, Zc)
  • the millimeter-wave radar coordinate system is (Xr, Yr, Zr)
  • the three-dimensional world coordinate system is (Xw, Yw, Zw).
  • the coordinate system where the millimeter-wave radar is located can be used as the benchmark to set the coordinate system where the millimeter-wave radar is located to coincide with the world coordinate system, which can be expressed by the following formula:
  • f represents the focal length of the visual sensor
  • (u0, v0) represents the principal point of the visual sensor
  • dx, dy represent the pixel unit size of the visual sensor in the x and y directions, respectively
  • [-a,-b,0] T represents the visual sensor
  • represents the rotation angle between the millimeter-wave radar and the vision sensor.
  • the coordinates of the millimeter-wave radar can be converted into the coordinates of the vision sensor, and the detection points of the millimeter-wave radar can be mapped to the image to be processed.
  • the location information is used to indicate where each mmWave detection point is mapped to the image to be processed.
  • a set of candidate frames can be determined according to the depth information and position information of each millimeter wave detection point, and the set of candidate frames includes multiple candidate frames. The following describes how to determine the candidate frame according to the depth information and the position information, respectively.
  • the size of the candidate frame is determined according to the depth information of the millimeter wave detection point.
  • the solution provided in this application uses the principle of pinhole imaging, that is, the closer the object distance is, the larger the image will be, and the farther the object distance will be, the smaller the image will be.
  • the solution provided in this application can set a priori box. Multiple a priori frames can be set. Specifically, multiple areas with different sizes or aspect ratios can be set as a priori frames.
  • the candidate frame is based on these a priori frames, which reduces the difficulty of training to a certain extent.
  • the size of the prior frame may be determined according to the size of the preset category. For example, the solution provided in this application can identify three categories, namely trucks, cars and buses, and can obtain the average size of trucks, the average size of cars and the average size of buses through a large number of statistical data. Then for each millimeter-wave detection point, at least three sizes of a priori frames can be determined, and then when the size of the candidate frame is determined according to the depth information, each of the three sizes can be determined according to the depth information of the millimeter-wave detection point. The size of the prior box is adjusted.
  • the position of the candidate frame is determined according to the position of the millimeter wave detection point mapped to the image to be processed.
  • the position of the candidate frame is determined according to the position of each millimeter wave detection point on the image to be processed.
  • the solution provided in this application determines the position of the candidate frame according to the distribution characteristics of the millimeter wave detection points and the positions of the millimeter wave detection points on the image to be processed.
  • the distribution characteristics of millimeter wave detection points may present different distribution characteristics in different scenarios.
  • the distribution characteristics of the millimeter wave detection points in a certain application scenario can be obtained through a large number of experimental statistics, which are described below with an example.
  • a vehicle can be placed in a clean background environment (a clean background environment can be understood as in addition to the vehicle in the scene, minimizing other objects around), the millimeter-wave radar emits high-frequency millimeter waves, which are reflected by the vehicle and collected by the receiving system , to get one statistic.
  • the high-frequency millimeter waves are transmitted multiple times through the millimeter-wave mine, and multiple data are counted for the vehicle, or different vehicles can be replaced, or different numbers of vehicles can be added, and multiple statistics can be performed to obtain the millimeter-wave detection points on the vehicle. distribution characteristics.
  • the distribution characteristics of millimeter wave detection points on people can be obtained, or the distribution characteristics of millimeter wave detection points on animals can also be obtained, or the distribution characteristics of millimeter wave detection points on goods (such as the distribution characteristics on the shipping box).
  • FIGS. 7a to 7c are schematic diagrams of application scenarios of a target determination method provided by the present application.
  • Figure 7a it is a schematic diagram of the acquired millimeter wave detection points when the detection target is a vehicle.
  • Each millimeter-wave detection point contains the distance between the target and the millimeter-wave radar. Some of these points are due to noise generated by multipath reflection or ray tracing, but these points also contain distance information.
  • FIG. 7b how to determine the candidate frame according to the position information is described by taking a millimeter wave detection point in FIG. 7a as an example. It should be noted that the principle of the process of determining the candidate frame according to the position information of each millimeter wave detection point is the same, and the description thereof will not be repeated one by one.
  • the relationship between the vehicle and the millimeter-wave detection points is determined according to the distribution characteristics of the millimeter-wave detection points on the vehicle, and the millimeter-wave detection point is generally located in the lower left corner of the target vehicle.
  • the lower left corner of the inspection frame determines a plurality of a priori frames, wherein the number of the plurality of a priori frames is determined according to the category of the pre-specified target.
  • the millimeter-wave detection point is generally located below the target vehicle, and the millimeter-wave detection point can be a priori.
  • the bottom of the box determines a number of prior boxes.
  • a plurality of a priori frames may be determined by the central position of the millimeter wave detection point at the lower side of the a priori frame.
  • the number and size of the a priori frames can be understood according to the description in FIG. 7a. Repeat the description again.
  • the relationship between the vehicle and the millimeter-wave detection points is determined according to the distribution characteristics of the millimeter-wave detection points on the vehicle, and the millimeter-wave detection points are generally located below the target vehicle, then as shown in Figure 7d, it is possible to A plurality of a priori frames are determined at the middle positions of the millimeter wave detection points below the prior frame.
  • a number of a priori boxes are determined at the lower left corner of the prior box with the millimeter wave detection point. It can be seen from Figures 7c to 7d that the solution provided by the present application can determine multiple prior frames according to the distribution characteristics of the millimeter wave detection points on a certain target, such as the distribution characteristics on a vehicle. It should be noted that, in FIGS. 7a to 7d , multiple a priori frames are determined with the millimeter wave detection point at the lower left of the prior frame, and multiple a priori frames are determined with the millimeter wave detection point at the lower left corner of the prior frame. It is only a preferred solution of the solution provided in this application.
  • other methods of determining the a priori frame may also be selected according to the distribution characteristics of the millimeter wave detection points on the target.
  • frame, or a plurality of candidate frames can be determined at any position on the left of the prior frame according to the millimeter wave detection point.
  • the present application determines the position of the candidate frame according to the distribution characteristics of the millimeter wave detection points on the target, so that the target can be better selected, in other words, the millimeter wave detection point can be better associated with the position of the target.
  • the application obtains through a large number of experiments that the millimeter-wave detection points are mostly distributed on the bottom and sides of the vehicle.
  • FIG. 7e it is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Fig. 7e taking two millimeter-wave detection points as an example, the determination of the candidate frame according to the depth information of the millimeter-wave detection points is described.
  • Figure 7e assuming that the depth information of millimeter-wave detection point A is smaller than that of millimeter-wave detection point B, the size of the candidate frame determined according to millimeter-wave detection point A should be smaller than the size of the candidate frame determined according to millimeter-wave detection point B size of.
  • the negative correlation between the depth information and the size of the candidate frame it can be understood with reference to the principle of pinhole imaging, which will not be described in this application.
  • NMS non-maximun suppression
  • NMS processing is performed on multiple candidate frames according to the depth information to output the target frame and the target millimeter wave detection point, and the target frame is determined according to the depth information and position information of the target millimeter wave detection point.
  • This application takes one target as an example for description, but it should be noted that when there are multiple targets, the solutions provided in this application are also applicable.
  • Non-maximum suppression is the suppression of elements that are not maximal.
  • the main purpose of this method is to reduce the number of candidate frames.
  • a large number of candidate frames can be determined according to the depth information and position information of the millimeter wave detection points. Probability value, each candidate frame will also correspond to the depth value of a millimeter wave detection point.
  • the redundant candidate frame can be removed by the NMS method to determine the final candidate frame. It should be noted that this application sometimes refers to the final candidate frame as the target frame. When the difference between the two is not emphasized, both of them represent the frame output after being processed by the NMS method, and the frame is used to represent the location of the target.
  • NMS The input of NMS is N candidate boxes that have been sorted according to the score from high to low.
  • multiple candidate boxes will be output.
  • M candidate boxes with the highest score and unsuppressed will be output, among which N is a positive integer greater than M, where the score of the candidate box is determined according to the depth information.
  • the target includes 3 categories, namely the first category, the second category and the third category. It is assumed that the probability distribution between the size of the target corresponding to the first category and the depth information is determined through a large number of statistics or through neural network learning (assumed to be the A probability distribution), and the size of the target corresponding to the second category and the depth information.
  • the probability distribution (assumed to be the B probability distribution), the probability distribution between the size of the target corresponding to the third category and the depth information (assumed to be the C probability distribution), then according to the depth information of the target in the candidate frame and the A probability distribution, B
  • the relationship between the probability distribution and the C probability distribution can determine the probability that the target in the candidate box belongs to a certain category.
  • the candidate frames B and D are discarded, and the first candidate frame F is marked, which is retained.
  • the candidate frames A, C, and E select the candidate frame E with the highest probability, and then judge the degree of overlap between the candidate frames A, C, and E. If the degree of overlap is greater than a certain threshold, then discard it; and mark the candidate frame E. is the second candidate box we keep. Repeat this process to find all the remaining candidate boxes, which are the final candidate boxes. And output the millimeter wave detection point corresponding to the detection target in the final candidate frame.
  • multiple candidate frames of the image to be processed are determined by the position information and depth information of the millimeter wave detection points, and NMS processing is performed on the multiple candidate frames according to the depth information.
  • the millimeter wave detection points associated with the candidate frame can be output to improve the accuracy of target matching.
  • NMS processing may be performed on multiple candidate frames according to depth information, and in some possible implementations, NMS processing may also be performed on multiple candidate frames according to depth information combined with other information. In addition, there may also be various ways to determine the probability distribution between the depth information and a certain category. Based on the embodiment corresponding to FIG. 6 , the embodiment corresponding to FIG. 6 is further refined or expanded below.
  • FIG. 8 is a schematic flowchart of another target determination method provided by an embodiment of the present application.
  • another target determination method provided by this embodiment of the present application may include the following steps:
  • Steps 801 to 804 can be understood with reference to steps 601 to 604 in the embodiment corresponding to FIG. 6 , and details are not repeated here.
  • the first score represents the probability determined by the classifier that the detection target in each candidate frame belongs to each of the N categories, where the N categories are preset categories, and N is a positive integer.
  • the second score represents the probability that the detection target in each candidate frame belongs to each of the N categories, determined according to the depth information and the first probability distribution between each category.
  • the input of the NMS is N candidate boxes that have been sorted from high to low score, and the output M candidate boxes with the highest score and not suppressed, where N is a positive integer greater than M,
  • the score of the candidate frame is determined according to the product of the first score and the second score.
  • the embodiment of the present application does not limit the type of the classifier.
  • the classifier will score each input candidate box. The higher the score, the higher the probability that there is a target of the corresponding category in the candidate box. Regarding the determination of the score of each candidate frame according to the classifier in the related art, the embodiments of the present application can all be adopted.
  • the candidate frames processed by the regression network if NMS processing is performed on multiple candidate frames only according to the first score, the situation shown in Figure 9a may occur, that is, there are a large number of repetitions and interferences in the results.
  • the method adds a dimension to determine the millimeter wave detection point corresponding to the candidate area, so as to improve the accuracy of the association and also improve the accuracy of the target detection.
  • a dimension to determine the millimeter wave detection point corresponding to the candidate area, so as to improve the accuracy of the association and also improve the accuracy of the target detection.
  • FIG. 9a assuming that the candidate frames are sorted from high to low according to the first score, there may be 3 millimeter wave detection points associated with the final output candidate frame after processing according to the first NMS.
  • the depth information of the millimeter-wave detection point A has the highest probability of corresponding to the category corresponding to the candidate frame, then after NMS processing, as shown in Figure 9b, the final output Candidate boxes and B mmWave detection points.
  • the score of each candidate frame may be determined according to the following formula, that is, the score of each candidate frame may be determined according to the first score and the second score.
  • the input of NMS is N candidate boxes sorted from high to low according to the scores determined by the first score and the second score, and the M final candidate boxes with the highest scores and no suppression are output.
  • the score of each candidate box can be expressed by the following formula:
  • score represents the score determined according to the first score and the second score
  • depth represents the depth information of each millimeter wave detection point
  • classes represents the type of the target
  • the number of types is preset, which has been explained above, here It will not be repeated.
  • p(A, B) represents the probability of A and B occurring at the same time, that is, the probability distribution between depth information and categories
  • B) represents the probability of A occurring under the probability of B occurring, that is, a category corresponds to The probability distribution of the depth information
  • mean represents the mean value
  • std represents the standard deviation.
  • N stands for Gaussian distribution.
  • the solution provided by this application performs NMS processing on multiple candidate frames through the first score and the second score, so as to improve the accuracy of data association, that is, to improve the one-to-one matching of the same target in different sensors. corresponding accuracy.
  • the following describes how to determine the probability distribution between the depth information and a certain category on the basis of the embodiments corresponding to FIG. 6 and FIG. 8 .
  • statistics are performed on the data in the first set to determine the probability distribution of the first size of the statistical target corresponding to each category, the first set includes a plurality of statistical targets corresponding to each category, and each Size information of a statistic object.
  • the first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
  • the first set includes 3 categories, namely trucks, cars and buses.
  • the samples corresponding to trucks that is, the statistical targets corresponding to trucks are 1000, the statistical targets corresponding to cars are 1000, and the samples corresponding to buses are The statistical target is 1000.
  • the statistical object includes size information.
  • A includes size information, such as A's physical size, or A's length information, or at least one of A's width information or A's height information.
  • the size information of each statistical target can obtain the probability distribution of the first size, that is, the probability distribution of the size of the statistical target under each category.
  • FIG. 10 shows a schematic diagram of the probability distribution of the first size when the size information is the height information of the target.
  • the relationship between the depth information and the size of the target can be determined through the principle of pinhole imaging.
  • the distance between the target and the millimeter-wave radar can be adjusted multiple times to obtain the relationship between the depth information and the size of the target. Relationship. When the relationship between the depth information and the size of the target, and the probability distribution of the size of the target under each category are obtained, the probability distribution between the depth information and each category can be determined.
  • the first set may be updated, and the probability distribution between the depth information and each probability category may be determined through the updated set. For example, statistics may be performed on the data in the second set to determine each The second size distribution of the statistical objects corresponding to the category, the second size distribution is used to update the first size distribution, and the second set includes a plurality of statistical objects corresponding to each category, and size information of each statistical object.
  • the first probability distribution is determined according to the second size distribution and the second relationship, where the second relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
  • FIG. 11 is a schematic flowchart of another target determination method provided by an embodiment of the present application.
  • the image is collected by the vision sensor, and the millimeter-wave detection points are acquired by the millimeter-wave radar, and the millimeter-wave detection points and the image frames in the video are time-aligned.
  • the sampling frequency of the visual sensor is the first frequency
  • the sampling frequency of the millimeter wave radar is the second frequency
  • the difference between the first frequency and the second frequency is not greater than a preset threshold.
  • the position of the point on the image and the depth information of the millimeter wave detection point are used to generate multiple candidate frames, and a second feature map corresponding to the multiple candidate frames is extracted from the first feature map.
  • the second feature map is processed by the classification layer and the regression layer, and the second feature map processed by the classification layer and the regression layer is processed by NMS according to the depth information, so as to output the detection result of the image and the detection result associated with the detection result.
  • Millimeter wave detection point is
  • FIG. 12 is a schematic diagram of an application scenario of a target determination method provided by the present application.
  • the solution provided by this application can be applied in the field of automatic driving.
  • objects on the road can be detected and recognized, such as detection and recognition of vehicles on the road. , which can detect the position of the vehicle within the image range obtained by the vision sensor, the type of the vehicle and the distance between the vehicle and the own vehicle.
  • the self-vehicle refers to the vehicle on which the visual sensor is installed.
  • the detection results of the vision sensor and the detection results of the millimeter wave detection points are matched and associated one by one.
  • the solution provided in this application can be applied in any scenario where the target-level detection results of the vision sensor and the millimeter-wave radar need to be correlated.
  • the solution provided by this application can be applied in a monitoring scenario.
  • objects in the monitoring area can be detected and identified, for example, vehicles or people in the monitoring area can be detected.
  • identification, for each target the detection results of the vision sensor and the detection results of the millimeter wave detection points are matched and associated one by one.
  • FIG. 13 it includes a comparison between the association result determined by the target determination method provided by the present application and the result of the first solution, which is a method without performing NMS processing according to depth information.
  • the application can be tested through a self-built database, wherein the self-built database can include a plurality of images that should be provided with millimeter wave detection points, and the images are marked with manual classification.
  • AP 50 means that the ratio of the final candidate frame in the image and the target object to be detected coincides with 50%.
  • NMS processing is performed according to the first score determined by the classifier and the second score determined according to the depth information, which improves the accuracy of target detection. Therefore, it can be seen from FIG. 13 that, in the target determination method provided by the present application, the AP 50 output by the convolutional neural network for recognizing pictures is significantly higher than that of the first solution. Therefore, the target determination method provided by the present application can also improve the accuracy of target detection.
  • the training process - a model training method.
  • FIG. 14 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • a model training method provided by an embodiment of the present application may include the following steps:
  • the training data consists of multiple training images mapped with mmWave detection points. Training images and mmWave detection points are data acquired simultaneously for the same target.
  • the target is sometimes referred to as the target object, and the two have the same meaning unless the difference between the two is emphasized.
  • the training image carries the label information of the target object, and the label information of the target object can be obtained by manual annotation.
  • the training image is also the original image used to train the target detection model, and the label information of the target object can be understood as the ground truth (GT) used to train the target detection model.
  • GT ground truth
  • step 1402 how to map the millimeter wave detection points to the training image can be understood by referring to step 1402 in the embodiment corresponding to FIG. 6 to map multiple millimeter wave detection points to the image to be processed, which will not be repeated here.
  • a set of candidate frames can be determined according to the depth information and position information of each millimeter wave detection point, and the set of candidate frames includes multiple candidate frames. It can be understood with reference to step 604 in the embodiment corresponding to FIG. 6 , and details are not repeated here.
  • the training data can be input into the model, for example, the god model can be fast R-CNN, faster R-CNN, mask region-based convolutional neural network (Mask R-CNN) and so on.
  • the first feature map of the training data can be obtained through the model, multiple candidate frames are generated according to the position of the millimeter wave detection point on the image and the depth information of the millimeter wave detection point, and the first feature map corresponding to the multiple candidate frames is extracted from the first feature map.
  • the model may be trained according to the second feature map, and it may be determined that the model training is completed until the loss function of the model converges.
  • the flow of the target determination method and the model training method provided by the present application has been introduced in detail.
  • the target determination device and the model training device provided by the present application are described below based on the aforementioned target determination method and model training method.
  • the target determination device uses In executing the steps of the method corresponding to the foregoing FIGS. 6-12 , the model training apparatus is configured to execute the steps of the foregoing method corresponding to FIG. 14 .
  • the target determination device includes:
  • the acquisition module 1501 is configured to acquire an image to be processed and multiple millimeter wave detection points, where the image to be processed and the multiple millimeter wave detection points are data obtained synchronously for the same target.
  • the image to be processed can be acquired by a vision sensor
  • the millimeter wave detection point can be acquired by a millimeter wave radar.
  • Each millimeter-wave detection point may include depth information, where the depth information is used to represent the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to acquire multiple millimeter-wave detection points.
  • the mapping module is configured to map the multiple millimeter wave detection points acquired by the acquisition module 1501 to the image to be processed acquired by the acquisition module 1501 .
  • the processing module 1502 is used to determine a plurality of candidate frames of the detection target on the image to be processed according to the first information.
  • the first information may include depth information and position information of each millimeter wave detection point, and the position information is used to represent each millimeter wave. The location of the wave detection point mapped on the image to be processed.
  • the processing module 1502 is also used to perform non-maximum suppression NMS processing on multiple candidate frames according to the depth information, so as to output the target frame and the target millimeter wave detection point (that is, output the target-level detection of the same detection target and different sensors). result), that is, output the association result.
  • the target frame is determined according to the depth information and position information of the target millimeter wave detection point.
  • the processing module 1502 is specifically configured to: perform non-maximum suppression NMS processing on multiple candidate frames according to a first score and a second score, where the first score represents each The probability that the detection target in the candidate box belongs to each of the N categories, N categories are preset categories, N is a positive integer, and the second score represents the first probability between each category according to the depth information The probability that the detection target in each candidate box belongs to each of the N categories, determined by the distribution.
  • the target determination device may further include a statistics module 1503, the statistics module 1503 is configured to perform statistics on the data in the first set, and determine the probability distribution of the first size of the statistical target corresponding to each category
  • the first set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object.
  • the first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
  • the statistics module 1503 is further configured to: perform statistics on the data in the second set, and determine the probability distribution of the second size of the statistical target corresponding to each category, and the second set may include each category Corresponding multiple statistical targets, and size information of each statistical target.
  • the second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between.
  • the size information is height information of the statistical target.
  • the position information is used to determine the position of the candidate frame in the image to be processed in combination with the distribution characteristics of the millimeter wave detection points on the vehicle.
  • the depth information is used to determine the size of the candidate frame, and the size of the candidate frame is negatively correlated with the depth information.
  • the processing module 1502 is further configured to: perform convolution processing on the image to be processed to obtain a first feature map of the image to be processed.
  • the second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map.
  • the second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
  • the image to be processed is acquired by the visual sensor, and the sampling of the visual sensor is used.
  • the frequency is the first frequency
  • the sampling frequency of the millimeter wave radar is the second frequency
  • the difference between the first frequency and the second frequency is not greater than a preset threshold.
  • the model training device includes:
  • the acquiring module 1601 is configured to perform step 1401 in the embodiment corresponding to FIG. 14 .
  • the training module 1602 is configured to perform steps 1402 and 1403 in the embodiment corresponding to FIG. 14 .
  • FIG. 17 is a schematic structural diagram of another target determination apparatus provided by the present application, as described below.
  • the target determination apparatus may include a processor 1701 and a memory 1702 .
  • the processor 1701 and the memory 1702 are interconnected by wires.
  • the memory 1702 stores program instructions and data.
  • the program instructions and data corresponding to the steps in FIG. 6 or FIG. 8 are stored in the memory 1702 .
  • the processor 1701 is configured to perform the method steps performed by the target determination apparatus shown in any of the foregoing embodiments in FIG. 6 or FIG. 8 .
  • Embodiments of the present application also provide a computer-readable storage medium, where a program for generating a vehicle's running speed is stored in the computer-readable storage medium, and when the computer is running on a computer, the computer is made to execute the program shown in FIG. 6 or FIG. 8 above.
  • the illustrated embodiment describes the steps in the method.
  • the aforementioned target determination device shown in FIG. 17 is a chip.
  • the embodiments of the present application also provide a digital processing chip.
  • the digital processing chip integrates circuits and one or more interfaces for implementing the above-mentioned processor 1701 or the functions of the processor 1701 .
  • the digital processing chip can perform the method steps of any one or more of the foregoing embodiments.
  • the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface.
  • the digital processing chip implements the actions performed by the target determination device in the above embodiment according to the program codes stored in the external memory.
  • An embodiment of the present application also provides a computer program product, which, when driving on a computer, causes the computer to execute the steps performed by the target determination device in the method described in the embodiment shown in FIG. 6 or FIG. 8 .
  • the target determination apparatus may be a chip, and the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the target determination method described in the embodiment shown in FIG. 6 or FIG. 8 .
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), a digital signal processing digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or it may be any conventional processor or the like.
  • FIG. 18 is a schematic structural diagram of a chip provided by an embodiment of the application.
  • the chip may be represented as a neural network processor NPU 180, and the NPU 180 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1803, which is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
  • the arithmetic circuit 1803 includes multiple processing units (process engines, PEs). In some implementations, the arithmetic circuit 1803 is a two-dimensional systolic array. The arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1803 is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory 1802 and buffers it on each PE in the operation circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1801 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1808 .
  • Unified memory 1806 is used to store input data and output data.
  • the weight data is directly passed through the storage unit access controller (direct memory access controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802.
  • Input data is also moved to unified memory 1806 via the DMAC.
  • DMAC direct memory access controller
  • a bus interface unit (BIU) 1810 is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1809.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1810 (bus interface unit, BIU) is used for the instruction fetch memory 1809 to obtain instructions from the external memory, and also for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 , the weight data to the weight memory 1802 , or the input data to the input memory 1801 .
  • the vector calculation unit 1807 includes a plurality of operation processing units, and if necessary, further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/fully connected layer network computations in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 1807 can store the processed output vectors to the unified memory 1806 .
  • the vector calculation unit 1807 may apply a linear function and/or a non-linear function to the output of the operation circuit 1803, such as linear interpolation of the feature plane extracted by the convolution layer, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 1807 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 1803, such as for use in subsequent layers in a neural network.
  • the instruction fetch buffer (instruction fetch buffer) 1809 connected to the controller 1804 is used to store the instructions used by the controller 1804;
  • the unified memory 1806, the input memory 1801, the weight memory 1802 and the instruction fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the RNN can be performed by the operation circuit 1803 or the vector calculation unit 1807 .
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in FIG. 6 or FIG. 8 .
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • U disk U disk
  • mobile hard disk ROM
  • RAM random access memory
  • disk or CD etc.
  • a computer device which can be a personal computer, server, or network device, etc. to execute the methods described in the various embodiments of the present application.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
  • modules may be combined or integrated into another system, or some features may be ignored.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some ports, and the indirect coupling or communication connection between modules may be electrical or other similar forms.
  • the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed into multiple circuit modules, and some or all of them may be selected according to actual needs. module to achieve the purpose of the solution of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Electromagnetism (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Traffic Control Systems (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

A target determination method, relating to the field of perception fusion, and applied in an intelligent vehicle or an intelligent connected vehicle. The method comprises: obtaining an image to be processed and a plurality of millimeter wave detection points, said image and the plurality of millimeter wave detection points being data synchronously obtained for a same detection target, and each millimeter wave detection point comprising depth information; mapping the plurality of millimeter wave detection points to said image; determining, according to first information, a plurality of candidate boxes of the detection target on said image, the first information comprising the depth information and position information of each millimeter wave detection point; and performing non-maximum suppression (NMS) processing on the plurality of candidate boxes according to the depth information to output a target box and a target millimeter wave detection point, the target box being determined according to depth information and position information of the target millimeter wave detection point. The method can improve the accuracy of correlation of detection results of a visual sensor and a millimeter wave radar.

Description

一种目标确定方法以及目标确定装置A target determination method and target determination device
本申请要求于2020年7月17日提交中国专利局、申请号为202010692086.4、申请名称为“一种目标确定方法以及目标确定装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on July 17, 2020 with the application number 202010692086.4 and the application title is "A target determination method and target determination device", the entire contents of which are incorporated herein by reference Applying.
技术领域technical field
本申请涉及通信技术领域,具体涉及一种目标确定方法以及目标确定装置。The present application relates to the field of communication technologies, and in particular, to a target determination method and a target determination device.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
目标检测与识别是指从场景(例如,图像)中找出目标,可以包括检测和识别两个过程。其中,检测具体指判断是否存在目标,若存在目标,则确定目标的位置。识别具体指确定目标的类别。目标检测与识别在生活中多个领域中有着广泛的应用,例如,自动驾驶领域、驾驶辅助预警等领域。在进行目标检测与识别过程中,通常需要进行多传感器融合,例如,将激光雷达、毫米波雷达、视觉传感器、红外线传感器等采集到的数据进行融合,以获取车辆周围环境信息,即对车辆周围环境中的目标物进行检测和识别。Object detection and recognition refers to finding objects from a scene (eg, an image), which can include two processes of detection and recognition. The detection specifically refers to judging whether there is a target, and if there is a target, determining the position of the target. Identifying specifically refers to identifying categories of targets. Object detection and recognition have a wide range of applications in many fields of life, such as automatic driving, driving assistance and early warning. In the process of target detection and recognition, multi-sensor fusion is usually required, for example, the data collected by lidar, millimeter-wave radar, vision sensor, infrared sensor, etc. Detection and identification of objects in the environment.
然而,要准确融合多传感器的数据须要进行目标在不同传感器的一一关联,即进行多传感器目标匹配。完成多传感器目标匹配后通过融合即可得到目标的准确信息。由于不同传感器的特性不同,异构传感器的检测结果的关联比较困难。其中,视觉传感器和毫米波雷达的检测结果的关联尤为困难。However, to accurately fuse multi-sensor data, one-to-one correlation of targets in different sensors is required, that is, multi-sensor target matching. After completing the multi-sensor target matching, the accurate information of the target can be obtained through fusion. Due to the different characteristics of different sensors, it is difficult to correlate the detection results of heterogeneous sensors. Among them, the correlation between the detection results of vision sensors and millimeter-wave radars is particularly difficult.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种目标确定方法,使得可以提升视觉传感器和毫米波雷达的检测结果的关联的准确率。The embodiments of the present application provide a method for determining a target, so that the accuracy of the correlation between the detection results of the vision sensor and the millimeter wave radar can be improved.
为达到上述目的,本申请实施例提供如下技术方案:To achieve the above purpose, the embodiments of the present application provide the following technical solutions:
本申请第一方面提供一种目标确定方法,本申请提供的方法可以适应于自动驾驶领域或者监控领域。可以包括:获取待处理图像和多个毫米波探测点,待处理图像和多个毫米波探测点是针对相同检测目标同步获取的数据。毫米波雷达的工作原理是利用高频电路产生特定调制频率的电磁波,并通过天线发送电磁波和接收从目标发射回来的电磁波,通过发送和接收电磁波的参数来计算目标的各个参数。毫米波雷达可以同时对多个目标进行测距、测速以及方位测量。测速是根据多普勒效应,而方位测量(包括水平角度和垂直角度) 是通过天线的阵列方式来实现。可以理解为一个毫米波探测点包括目标的各个参数,具体的,本方案中的毫米波探测点包括深度信息,即通过测距获取的参数。当然,毫米波探测点还可以包括其他参数,比如一个毫米波探测点可以包括目标的深度信息,目标的速度信息(通过测速获取的参数)以及目标的方位信息(通过方位测量获取的参数)。其中,同步获取的数据可以理解为毫米波雷达和图像传感器同时采集数据,或者可以理解为毫米波雷达和图像传感器采集数据的帧率的偏差在预设范围内。比如,针对相同的检测目标,毫米波雷达按照第一帧率采集毫米波探测点,图像传感器按照第二帧率采集待处理图像,第一帧率和第二帧率的偏差小于预设阈值即可以认为毫米波雷达和待处理图像是同步获取的数据。可以通过视觉传感器获取待处理图像,通过毫米波雷达获取多个毫米波探测点。当本申请提供的方法应用在自动驾驶领域的场景下时,待处理图像可以是车辆通过视觉传感器获取的图像,具体的,待处理图像可以是车辆通过安装在车辆上的摄像机拍摄的图像。当本申请提供的方法应用在监控场景时,待处理图像可以是马路边安装的视觉传感器获取的图像,具体的,待处理图像可以是马路边安装的摄像机拍摄的图像。每个毫米波探测点可以包括深度信息,深度信息用于表示检测目标与毫米波雷达的距离,毫米波雷达用于获取多个毫米波探测点。该检测目标可以是车辆、人、树木等任意目标。将多个毫米波探测点映射到待处理图像上。根据第一信息确定检测目标在待处理图像上的多个候选框,第一信息可以包括每个毫米波探测点的深度信息和位置信息,位置信息用于表示每个毫米波探测点映射在待处理图像上的位置。根据每个毫米波探测点的深度信息和位置信息都可以确定一组候选框,该一组候选框中包括多个候选框。根据深度信息对多个候选框进行非极大值抑制(non-maximun suppression,NMS)处理,以输出目标框和目标毫米波探测点,目标框是根据目标毫米波探测点的深度信息和位置信息确定的。由第一方面可知,通过将毫米波探测点映射到待处理图像上,通过毫米波探测点的位置信息和深度信息确定待处理图像的多个候选框,并根据深度信息对多个候选框进行NMS处理,当确定最终的候选框时,可以输出与该候选框关联的毫米波探测点,提升目标匹配的精确度。A first aspect of the present application provides a target determination method, and the method provided by the present application can be adapted to the field of automatic driving or the field of monitoring. It may include: acquiring a to-be-processed image and multiple millimeter-wave detection points, where the to-be-processed image and the multiple millimeter-wave detection points are data obtained synchronously for the same detection target. The working principle of millimeter-wave radar is to use high-frequency circuits to generate electromagnetic waves with specific modulation frequencies, and to send electromagnetic waves and receive electromagnetic waves from the target through the antenna, and calculate the parameters of the target through the parameters of the transmitted and received electromagnetic waves. Millimeter-wave radar can measure distance, speed and azimuth of multiple targets at the same time. Velocity measurement is based on the Doppler effect, while azimuth measurement (including horizontal and vertical angles) is achieved by means of an array of antennas. It can be understood that a millimeter-wave detection point includes various parameters of the target. Specifically, the millimeter-wave detection point in this solution includes depth information, that is, parameters obtained through ranging. Of course, the millimeter wave detection point may also include other parameters. For example, a millimeter wave detection point may include the depth information of the target, the speed information of the target (parameters obtained through speed measurement), and the azimuth information of the target (parameters obtained through azimuth measurement). The data acquired synchronously can be understood as the millimeter-wave radar and the image sensor simultaneously collect data, or it can be understood as the deviation of the frame rate of the data collected by the millimeter-wave radar and the image sensor within a preset range. For example, for the same detection target, the millimeter-wave radar collects the millimeter-wave detection points according to the first frame rate, and the image sensor collects the image to be processed according to the second frame rate, and the deviation between the first frame rate and the second frame rate is less than the preset threshold, namely It can be considered that the millimeter-wave radar and the image to be processed are data acquired synchronously. The image to be processed can be obtained through the vision sensor, and multiple millimeter-wave detection points can be obtained through the millimeter-wave radar. When the method provided in this application is applied in the scene of automatic driving, the image to be processed may be an image obtained by the vehicle through a visual sensor, and specifically, the image to be processed may be an image captured by the vehicle through a camera installed on the vehicle. When the method provided in this application is applied to a monitoring scene, the image to be processed may be an image acquired by a visual sensor installed on the roadside, and specifically, the image to be processed may be an image captured by a camera installed on the roadside. Each millimeter-wave detection point may include depth information, where the depth information is used to represent the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to acquire multiple millimeter-wave detection points. The detection target can be any target such as vehicles, people, trees, etc. Map multiple mmWave detection points onto the image to be processed. A plurality of candidate frames of the detection target on the to-be-processed image are determined according to the first information. The first information may include depth information and position information of each millimeter-wave detection point. The position information is used to indicate that each millimeter-wave detection point is mapped on the to-be-processed image. Process the position on the image. A set of candidate frames can be determined according to the depth information and position information of each millimeter wave detection point, and the set of candidate frames includes multiple candidate frames. Perform non-maximun suppression (NMS) processing on multiple candidate frames according to the depth information to output the target frame and the target millimeter wave detection point. The target frame is based on the depth information and position information of the target millimeter wave detection point. definite. It can be seen from the first aspect that by mapping the millimeter wave detection points to the image to be processed, multiple candidate frames of the to-be-processed image are determined according to the position information and depth information of the millimeter wave detection points, and the multiple candidate frames are processed according to the depth information. In NMS processing, when the final candidate frame is determined, the millimeter wave detection points associated with the candidate frame can be output to improve the accuracy of target matching.
可选地,结合上述第一方面,在第一种可能的实现方式中,根据深度信息对多个候选框进行非极大值抑制NMS处理,可以包括:根据第一分数和第二分数对多个候选框进行非极大值抑制NMS处理,第一分数表示根据分类器确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率,N个类别为预先设定的类别,N为正整数,第二分数表示根据深度信息与每个类别之间的第一概率分布确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率。由第一方面第一种可能的实现方式可知,给出了一种具体的根据深度信息对多个候选框进行NMS处理的方式。第一方面第一种可能的实现方式通过第一分数和第二分数对多个候选框进行NMS处理,提升数据关联的准确率,即提升同一个目标在不同传感器的一一匹配对应的准确率。Optionally, in combination with the above-mentioned first aspect, in a first possible implementation manner, performing non-maximum value suppression NMS processing on multiple candidate frames according to depth information may include: pairing multiple candidate frames according to the first score and the second score. Each candidate frame is subjected to non-maximum value suppression NMS processing, and the first score represents the probability that the detection target in each candidate frame belongs to each of the N categories determined by the classifier, and the N categories are preset Category, N is a positive integer, and the second score indicates the probability that the detection target in each candidate frame belongs to each of the N categories, determined according to the depth information and the first probability distribution between each category. It can be known from the first possible implementation manner of the first aspect that a specific manner of performing NMS processing on multiple candidate frames according to depth information is provided. The first possible implementation method of the first aspect is to perform NMS processing on multiple candidate boxes through the first score and the second score to improve the accuracy of data association, that is, to improve the accuracy of the one-to-one matching of the same target in different sensors. .
可选地,结合上述第一方面第一种可能的实现方式,在第二种可能的实现方式中,该方法还可以包括:对第一集合中的数据进行统计,确定每个类别对应的统计目标的第一尺寸的概率分布,第一集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第一尺寸的概率分布以及第一关系确定第一概率分布,第一关系为统计目标 的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。由第一方面第二种可能的实现方式可知,给出了一种具体的如何确定第一概率分布的方式,增加了方案的多样性。Optionally, in combination with the first possible implementation manner of the above-mentioned first aspect, in the second possible implementation manner, the method may further include: performing statistics on the data in the first set, and determining the statistics corresponding to each category The probability distribution of the first size of the target, the first set may include a plurality of statistical targets corresponding to each category, and size information of each statistical target. The first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target. It can be seen from the second possible implementation manner of the first aspect that a specific manner of how to determine the first probability distribution is given, which increases the diversity of the scheme.
可选地,结合上述第一方面第二种可能的实现方式,在第三种可能的实现方式中,该方法还可以包括:对第二集合中的数据进行统计,确定每个类别对应的统计目标的第二尺寸的概率分布,第二集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第二尺寸分布以及第二关系确定第二概率分布,所示第二概率分布用于更新第一概率分布,第二关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。由第一方面第三种可能的实现方式可知,可以对数据进行更新。比如在自动驾驶的场景中,可以通过更新后的数据确定深度信息和每个类别之间的概率分布。Optionally, in combination with the second possible implementation manner of the above-mentioned first aspect, in a third possible implementation manner, the method may further include: performing statistics on the data in the second set, and determining the statistics corresponding to each category The probability distribution of the second size of the objects, the second set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object. The second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between. It can be known from the third possible implementation manner of the first aspect that the data can be updated. For example, in an autonomous driving scenario, the depth information and the probability distribution between each category can be determined through the updated data.
可选地,结合上述第一方面第二种或第一方面第三种可能的实施方式,在第四种可能的实施方式中,尺寸信息为统计目标的高度信息。由第一方面第四种可能的实现方式可知,给出了一种具体的尺寸信息的类别,增加了方案的多样性。Optionally, in combination with the second possible implementation manner of the first aspect or the third possible implementation manner of the first aspect, in a fourth possible implementation manner, the size information is height information of the statistical target. It can be known from the fourth possible implementation manner of the first aspect that a specific category of size information is given, which increases the diversity of solutions.
可选地,结合上述第一方面或第一方面第一种至第一方面第四种可能的实施方式,在第五种可能的实施方式中,位置信息用于结合毫米波探测点在车辆上的分布特性确定候选框在待处理图像中的位置。由第一方面第五种可能的实现方式可知,在确定候选框在待处理图像中的位置时,考虑到毫米波探测点在车辆上的分布特性,可以通过毫米波探测点的位置更好的确定检测目标在待处理图像中的位置。比如,如果根据毫米波探测点在车辆上的分布特性确定车辆和毫米波探测点的关系为,毫米波探测点一般位于目标车辆的左下角,则可以以毫米波探测点在先验框的左下角确定多个先验框。如果不考虑毫米波探测点的分布特性,随便根据毫米波探测点确定先验框的位置,比如以毫米波探测点在先验框的右上角确定多个先验框,则先验框中包括检测目标的概率将会减少。Optionally, in combination with the first aspect or the fourth possible implementation manner of the first aspect to the first aspect, in a fifth possible implementation manner, the location information is used in combination with the millimeter wave detection point on the vehicle. The distribution characteristics of , determine the position of the candidate frame in the image to be processed. It can be seen from the fifth possible implementation manner of the first aspect that when determining the position of the candidate frame in the image to be processed, considering the distribution characteristics of the millimeter wave detection points on the vehicle, the position of the millimeter wave detection points can be better determined. Determine the position of the detection target in the image to be processed. For example, if the relationship between the vehicle and the millimeter-wave detection points is determined according to the distribution characteristics of the millimeter-wave detection points on the vehicle, the millimeter-wave detection point is generally located in the lower left corner of the target vehicle, then the millimeter-wave detection point can be located in the lower left corner of the prior frame. The corners determine multiple a priori boxes. If the distribution characteristics of the millimeter-wave detection points are not considered, the position of the a priori frame is arbitrarily determined according to the millimeter-wave detection points. The probability of detecting the target will decrease.
可选地,结合上述第一方面或第一方面第一种至第一方面第五种可能的实施方式,在第六种可能的实施方式中,深度信息用于确定候选框的尺寸,候选框的尺寸与深度信息负相关。Optionally, in combination with the first aspect or the fifth possible implementation manner of the first aspect to the first aspect, in the sixth possible implementation manner, the depth information is used to determine the size of the candidate frame, and the candidate frame The size of is negatively correlated with depth information.
可选地,结合上述第一方面或第一方面第一种至第一方面第六种可能的实施方式,在第七种可能的实施方式中,方法还可以包括:通过高效区域卷积神经网络(faster regions with convolution neural network,Faster-RCNN)对待处理图像进行处理,得到待处理图像的第一特征图。从第一特征图中提取多个候选框对应的第二特征图。通过回归网络和分类器对第二特征图进行处理,以得到第一结果,第一结果用于进行非极大值抑制NMS处理。Optionally, in combination with the first aspect or the sixth possible implementation manner of the first aspect to the first aspect, in the seventh possible implementation manner, the method may further include: using an efficient regional convolutional neural network (faster regions with convolution neural network, Faster-RCNN) process the image to be processed to obtain the first feature map of the image to be processed. The second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map. The second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
可选地,结合上述第一方面或第一方面第一种至第一方面第七种可能的实施方式,在第八种可能的实施方式中,通过视觉传感器获取待处理图像,视觉传感器的采样频率为第一频率,毫米波雷达的采样频率为第二频率,第一频率和第二频率的差值不大于预设阈值。Optionally, in combination with the first aspect or the seventh possible implementation manner of the first aspect to the first aspect, in the eighth possible implementation manner, the image to be processed is acquired by the visual sensor, and the sampling of the visual sensor is used. The frequency is the first frequency, the sampling frequency of the millimeter wave radar is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold.
本申请第二方面提供一种目标确定装置,可以包括:获取模块,用于获取待处理图像和多个毫米波探测点,待处理图像和多个毫米波探测点是针对相同目标同步获取的数据,每个毫米波探测点可以包括深度信息,深度信息用于表示检测目标与毫米波雷达的距离,毫米波雷达用于获取多个毫米波探测点。映射模块,用于将获取模块获取的多个毫米波探 测点映射到获取模块获取待处理图像上。处理模块,用于根据第一信息确定检测目标在待处理图像上的多个候选框,第一信息可以包括每个毫米波探测点的深度信息和位置信息,位置信息用于表示每个毫米波探测点映射在待处理图像上的位置。处理模块,还用于根据深度信息对多个候选框进行非极大值抑制NMS处理,以输出目标框和目标毫米波探测点,目标框是根据目标毫米波探测点的深度信息和位置信息确定的。A second aspect of the present application provides a device for determining a target, which may include: an acquisition module configured to acquire an image to be processed and multiple millimeter-wave detection points, where the image to be processed and the multiple millimeter-wave detection points are data obtained synchronously for the same target , each millimeter-wave detection point can include depth information, the depth information is used to indicate the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to obtain multiple millimeter-wave detection points. The mapping module is used to map the multiple millimeter wave detection points acquired by the acquisition module to the to-be-processed image acquired by the acquisition module. The processing module is configured to determine a plurality of candidate frames of the detection target on the image to be processed according to the first information, the first information may include depth information and position information of each millimeter wave detection point, and the position information is used to represent each millimeter wave The location of the detection point mapped on the image to be processed. The processing module is also used to perform non-maximum suppression NMS processing on multiple candidate frames according to the depth information, so as to output the target frame and the target millimeter wave detection point. The target frame is determined according to the depth information and position information of the target millimeter wave detection point. of.
可选地,结合上述第二方面,在第一种可能的实现方式中,处理模块,具体用于:根据第一分数和第二分数对多个候选框进行非极大值抑制NMS处理,第一分数表示根据分类器确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率,N个类别为预先设定的类别,N为正整数,第二分数表示根据深度信息与每个类别之间的第一概率分布确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率。Optionally, in combination with the above second aspect, in a first possible implementation manner, the processing module is specifically configured to: perform non-maximum suppression NMS processing on multiple candidate frames according to the first score and the second score, and the first One score represents the probability that the detection target in each candidate frame belongs to each of the N categories determined by the classifier, where the N categories are preset categories, N is a positive integer, and the second score represents the depth The probability that the detection target in each candidate frame belongs to each of the N categories, determined by the first probability distribution between the information and each category.
可选地,结合上述第二方面第一种可能的实现方式,在第二种可能的实现方式中,目标确定装置还可以包括统计模块,统计模块,用于对第一集合中的数据进行统计,确定每个类别对应的统计目标的第一尺寸的概率分布,第一集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第一尺寸的概率分布以及第一关系确定第一概率分布,第一关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。Optionally, in combination with the first possible implementation manner of the second aspect, in the second possible implementation manner, the target determination device may further include a statistics module, a statistics module for performing statistics on the data in the first set. , determine the probability distribution of the first size of the statistical objects corresponding to each category, and the first set may include multiple statistical objects corresponding to each category, and size information of each statistical object. The first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
可选地,结合上述第二方面第二种可能的实现方式,在第三种可能的实现方式中,统计模块,还用于:对第二集合中的数据进行统计,确定每个类别对应的统计目标的第二尺寸的概率分布,第二集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第二尺寸分布以及第二关系确定第二概率分布,所示第二概率分布用于更新第一概率分布,第二关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。Optionally, in combination with the second possible implementation manner of the second aspect, in a third possible implementation manner, the statistics module is further configured to: perform statistics on the data in the second set, and determine the corresponding data of each category. The probability distribution of the second size of the statistical objects, the second set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object. The second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between.
可选地,结合上述第二方面第二种或第二方面第三种可能的实施方式,在第四种可能的实施方式中,尺寸信息为统计目标的高度信息。Optionally, in combination with the second possible implementation manner of the second aspect or the third possible implementation manner of the second aspect, in a fourth possible implementation manner, the size information is height information of the statistical target.
可选地,结合上述第二方面或第二方面第一种至第二方面第四种可能的实施方式,在第五种可能的实施方式中,位置信息用于结合毫米波探测点在车辆上的分布特性确定候选框在待处理图像中的位置。Optionally, in combination with the second aspect or the fourth possible implementation manner of the first to the second aspect, in a fifth possible implementation manner, the location information is used in combination with the millimeter wave detection point on the vehicle. The distribution characteristics of , determine the position of the candidate frame in the image to be processed.
可选地,结合上述第二方面或第二方面第一种至第二方面第五种可能的实施方式,在第六种可能的实施方式中,深度信息用于确定候选框的尺寸,候选框的尺寸与深度信息负相关。Optionally, in combination with the second aspect or the fifth possible implementation manner of the second aspect or the first to the second aspect, in the sixth possible implementation manner, the depth information is used to determine the size of the candidate frame, and the candidate frame The size of is negatively correlated with depth information.
可选地,结合上述第二方面或第二方面第一种至第二方面第六种可能的实施方式,在第七种可能的实施方式中,处理模块,还用于:通过Faster-RCNN对待处理图像进行处理,得到待处理图像的第一特征图。从第一特征图中提取多个候选框对应的第二特征图。通过回归网络和分类器对第二特征图进行处理,以得到第一结果,第一结果用于进行非极大值抑制NMS处理。Optionally, in combination with the second aspect or the sixth possible implementation manner of the second aspect or the first to the second aspect, in the seventh possible implementation manner, the processing module is further configured to: The processed image is processed to obtain a first feature map of the to-be-processed image. The second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map. The second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
可选地,结合上述第二方面或第二方面第一种至第二方面第七种可能的实施方式,在第八种可能的实施方式中,通过视觉传感器获取待处理图像,视觉传感器的采样频率为第 一频率,毫米波雷达的采样频率为第二频率,第一频率和第二频率的差值不大于预设阈值。Optionally, in combination with the second aspect or the seventh possible implementation manner of the second aspect or the first to the second aspect, in the eighth possible implementation manner, the image to be processed is acquired by the visual sensor, and the sampling of the visual sensor is used. The frequency is the first frequency, the sampling frequency of the millimeter wave radar is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold.
本申请第三方面提供一种智能汽车,智能汽车可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行第一方面或第一方面任意一种可能的实施方式中描述的方法。A third aspect of the present application provides a smart car. The smart car may include a processor, the processor is coupled with a memory, and the memory stores program instructions. When the program instructions stored in the memory are executed by the processor, the first aspect or any one of the first aspect methods described in possible implementations.
本申请第四方面提供一种监控设备,监控设备处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行第一方面或第一方面任意一种可能的实施方式中描述的方法。A fourth aspect of the present application provides a monitoring device. The monitoring device has a processor, the processor is coupled to a memory, and the memory stores program instructions. When the program instructions stored in the memory are executed by the processor, the first aspect or any one of the first aspects may be possible. The method described in the embodiment.
本申请第五方面提供一种计算机可读存储介质,可以包括程序,当其在计算机上运行时,使得计算机执行如第一方面或第一方面任意一种可能的实施方式中描述的方法。A fifth aspect of the present application provides a computer-readable storage medium, which may include a program that, when executed on a computer, causes the computer to execute the method described in the first aspect or any possible implementation manner of the first aspect.
本申请第六方面提供一种目标确定系统,目标确定系统可以包括端侧设备和云侧设备,端侧设备,用于获取待处理图像和多个毫米波探测点,待处理图像和多个毫米波探测点是针对相同检测目标同步获取的数据,每个毫米波探测点可以包括深度信息,深度信息用于表示检测目标与毫米波雷达的距离,毫米波雷达用于获取多个毫米波探测点。云侧设备,用于接收端侧设备发送的待处理图像和多个毫米波探测点。云侧设备,还用于将多个毫米波探测点映射到待处理图像上。云侧设备,还用于根据第一信息确定检测目标在待处理图像上的多个候选框,第一信息可以包括每个毫米波探测点的深度信息和位置信息,位置信息用于表示每个毫米波探测点映射在待处理图像上的位置。云侧设备,还用于根据深度信息对多个候选框进行非极大值抑制NMS处理,以输出目标框和目标毫米波探测点,目标框是根据目标毫米波探测点的深度信息和位置信息确定的。A sixth aspect of the present application provides a target determination system. The target determination system may include an end-side device and a cloud-side device, and the end-side device is used to acquire a to-be-processed image and multiple millimeter wave detection points, the to-be-processed image and multiple millimeter wave detection points. The wave detection point is the data obtained synchronously for the same detection target. Each millimeter wave detection point can include depth information. The depth information is used to indicate the distance between the detection target and the millimeter wave radar. The millimeter wave radar is used to obtain multiple millimeter wave detection points. . The cloud-side device is used to receive the to-be-processed image and multiple millimeter-wave detection points sent by the end-side device. The cloud-side device is also used to map multiple millimeter wave detection points to the image to be processed. The cloud-side device is further configured to determine multiple candidate frames of the detection target on the to-be-processed image according to the first information, where the first information may include depth information and position information of each millimeter wave detection point, and the position information is used to represent each The location of the mmWave detection point mapped on the image to be processed. The cloud-side device is also used to perform non-maximum suppression NMS processing on multiple candidate frames according to the depth information to output the target frame and the target millimeter wave detection point. The target frame is based on the depth information and position information of the target millimeter wave detection point. definite.
可选地,结合上述第六方面,在第一种可能的实施方式中,云侧设备,具体用于根据第一分数和第二分数对多个候选框进行非极大值抑制NMS处理,第一分数表示根据分类器确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率,N个类别为预先设定的类别,N为正整数,第二分数表示根据深度信息与每个类别之间的第一概率分布确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率。Optionally, in combination with the above sixth aspect, in a first possible implementation manner, the cloud-side device is specifically configured to perform non-maximum suppression NMS processing on multiple candidate frames according to the first score and the second score, and the first One score represents the probability that the detection target in each candidate frame belongs to each of the N categories determined by the classifier, where the N categories are preset categories, N is a positive integer, and the second score represents the depth The probability that the detection target in each candidate frame belongs to each of the N categories, determined by the first probability distribution between the information and each category.
可选地,结合上述第六方面第一种可能的实现方式,在第二种可能的实现方式中,云侧设备,还用于对第一集合中的数据进行统计,确定每个类别对应的统计目标的第一尺寸的概率分布,第一集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第一尺寸的概率分布以及第一关系确定第一概率分布,第一关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。Optionally, in combination with the first possible implementation manner of the sixth aspect, in the second possible implementation manner, the cloud-side device is further configured to perform statistics on the data in the first set, and determine the corresponding data of each category. The probability distribution of the first size of the statistical objects, the first set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object. The first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
可选地,结合上述第六方面第二种可能的实现方式,在第三种可能的实现方式中,云侧设备,还用于对第二集合中的数据进行统计,确定每个类别对应的统计目标的第二尺寸的概率分布,第二集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第二尺寸分布以及第二关系确定第二概率分布,所示第二概率分布用于更新第一概率分布,第二关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。Optionally, in combination with the second possible implementation manner of the sixth aspect, in the third possible implementation manner, the cloud-side device is further configured to perform statistics on the data in the second set, and determine the corresponding data of each category. The probability distribution of the second size of the statistical objects, the second set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object. The second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between.
可选地,结合上述第六方面第二种或第六方面第三种可能的实施方式,在第四种可能的实施方式中,尺寸信息为统计目标的高度信息。Optionally, in combination with the second possible implementation manner of the sixth aspect or the third possible implementation manner of the sixth aspect, in a fourth possible implementation manner, the size information is height information of the statistical target.
可选地,结合上述第六方面或第六方面第一种至第六方面第四种可能的实施方式,在第五种可能的实施方式中,位置信息用于结合毫米波探测点在车辆上的分布特性确定候选框在待处理图像中的位置。Optionally, in combination with the sixth aspect or the fourth possible implementation manner of the sixth aspect from the first to the sixth aspect, in the fifth possible implementation manner, the location information is used in combination with the millimeter wave detection point on the vehicle. The distribution characteristics of , determine the position of the candidate frame in the image to be processed.
可选地,结合上述第六方面或第六方面第一种至第六方面第五种可能的实施方式,在第六种可能的实施方式中,深度信息用于确定候选框的尺寸,候选框的尺寸与深度信息负相关。Optionally, in combination with the sixth aspect or the fifth possible implementation manner of the sixth aspect or the first to sixth aspects, in the sixth possible implementation manner, the depth information is used to determine the size of the candidate frame, and the candidate frame The size of is negatively correlated with depth information.
可选地,结合上述第六方面或第六方面第一种至第六方面第六种可能的实施方式,在第七种可能的实施方式中,云侧设备,还用于通过Faster-RCNN对待处理图像进行处理,得到待处理图像的第一特征图。从第一特征图中提取多个候选框对应的第二特征图。通过回归网络和分类器对第二特征图进行处理,以得到第一结果,第一结果用于进行非极大值抑制NMS处理。Optionally, in combination with the sixth aspect or the sixth possible implementation manner of the sixth aspect from the first to the sixth aspect, in the seventh possible implementation manner, the cloud-side device is also used for processing by Faster-RCNN. The processed image is processed to obtain a first feature map of the to-be-processed image. The second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map. The second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
可选地,结合上述第六方面或第六方面第一种至第六方面第七种可能的实施方式,在第八种可能的实施方式中,端侧设备通过视觉传感器获取待处理图像,视觉传感器的采样频率为第一频率,毫米波雷达的采样频率为第二频率,第一频率和第二频率的差值不大于预设阈值。Optionally, in combination with the sixth aspect or the seventh possible implementation manner of the sixth aspect or the first to sixth aspects, in the eighth possible implementation manner, the end-side device acquires the image to be processed through a visual sensor, and the visual The sampling frequency of the sensor is the first frequency, the sampling frequency of the millimeter wave radar is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold.
本申请第七方面提供一种模型训练方法,可以包括:获取训练图像和多个毫米波探测点,训练图像和多个毫米波探测点针对相同检测目标同步获取的数据,每个毫米波探测点可以包括深度信息,深度信息用于表示检测目标与毫米波雷达的距离,毫米波雷达用于获取多个毫米波探测点。将多个毫米波探测点映射到训练图像上。根据第一信息确定检测目标在训练图像上的多个候选框,第一信息可以包括每个毫米波探测点的深度信息和位置信息,位置信息用于表示每个毫米波探测点映射在训练图像上的位置。根据多个候选框对应的特征图对模型进行训练。A seventh aspect of the present application provides a model training method, which may include: acquiring a training image and multiple millimeter-wave detection points, data obtained from the training image and multiple millimeter-wave detection points synchronously for the same detection target, each millimeter-wave detection point Depth information can be included, and the depth information is used to indicate the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to obtain multiple millimeter-wave detection points. Map multiple mmWave probe points onto the training image. A plurality of candidate frames of the detection target on the training image are determined according to the first information. The first information may include depth information and position information of each millimeter wave detection point, and the position information is used to indicate that each millimeter wave detection point is mapped on the training image. on the location. The model is trained according to the feature maps corresponding to multiple candidate boxes.
可选地,结合上述第七方面,在第一种可能的实施方式中,位置信息用于结合毫米波探测点在车辆上的分布特性确定候选框在训练图像中的位置。Optionally, in combination with the above seventh aspect, in a first possible implementation manner, the position information is used to determine the position of the candidate frame in the training image in combination with the distribution characteristics of the millimeter wave detection points on the vehicle.
可选地,结合上述第七方面或第七方面第一种可能的实施方式,在第二种可能的实施方式中,深度信息用于确定候选框的尺寸,候选框的尺寸与深度信息负相关。Optionally, in combination with the seventh aspect or the first possible implementation manner of the seventh aspect, in the second possible implementation manner, the depth information is used to determine the size of the candidate frame, and the size of the candidate frame is negatively correlated with the depth information. .
可选地,结合上述第七方面或第七方面第一种或第七方面第二种可能的实施方式,在第三种可能的实施方式中,还可以包括对训练图像进行卷积处理,得到训练图像的第一特征图。从第一特征图中提取多个候选框对应的第二特征图,根据第二特征图对模型进行训练。Optionally, in combination with the seventh aspect or the first possible implementation manner of the seventh aspect or the second possible implementation manner of the seventh aspect, in the third possible implementation manner, convolution processing may also be performed on the training image to obtain: The first feature map of the training image. The second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map, and the model is trained according to the second feature maps.
可选地,结合上述第七方面或第七方面第一种至第七方面第三种可能的实施方式,在第四种可能的实施方式中,通过视觉传感器获取训练图像,视觉传感器的采样频率为第一频率,毫米波雷达的采样频率为第二频率,第一频率和第二频率的差值不大于预设阈值。Optionally, in combination with the seventh aspect or the third possible implementation manner of the seventh aspect or the first to seventh aspects, in a fourth possible implementation manner, the training image is acquired by the visual sensor, and the sampling frequency of the visual sensor is is the first frequency, the sampling frequency of the millimeter wave radar is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold.
附图说明Description of drawings
图1a为目标级别的检测结果融合的流程示意图;Figure 1a is a schematic flowchart of the fusion of detection results at the target level;
图1b为特征级融合的流程示意图;Figure 1b is a schematic flowchart of feature-level fusion;
图2为异构传感器在不同维度的检测性能的示意图;Figure 2 is a schematic diagram of the detection performance of heterogeneous sensors in different dimensions;
图3为本申请实施例提供的一种卷积神经网络的结构示意图;3 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application;
图4为本申请实施例提供的另一种卷积神经网络的结构示意图;4 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application;
图5为一种高效区域卷积神经网络的示意图;5 is a schematic diagram of an efficient regional convolutional neural network;
图6为本申请实施例提供的一种目标确定方法的流程示意图;6 is a schematic flowchart of a target determination method provided by an embodiment of the present application;
图7a为本申请提供的一种目标确定方法的应用场景的示意图;7a is a schematic diagram of an application scenario of a target determination method provided by the present application;
图7b为本申请提供的另一种目标确定方法的应用场景的示意图;7b is a schematic diagram of an application scenario of another target determination method provided by the present application;
图7c为本申请提供的另一种目标确定方法的应用场景的示意图;7c is a schematic diagram of an application scenario of another target determination method provided by the present application;
图7d为本申请提供的另一种目标确定方法的应用场景的示意图;7d is a schematic diagram of an application scenario of another target determination method provided by the present application;
图7e为本申请提供的另一种目标确定方法的应用场景的示意图;FIG. 7e is a schematic diagram of an application scenario of another target determination method provided by the present application;
图8为本申请实施例提供的另一种目标确定方法的流程示意图;8 is a schematic flowchart of another target determination method provided by an embodiment of the present application;
图9a为本申请提供的另一种目标确定方法的应用场景的示意图;9a is a schematic diagram of an application scenario of another target determination method provided by the present application;
图9b为本申请提供的另一种目标确定方法的应用场景的示意图;9b is a schematic diagram of an application scenario of another target determination method provided by the present application;
图10为本申请提供的第一尺寸的概率分布的示意图;10 is a schematic diagram of the probability distribution of the first size provided by the application;
图11为本申请实施例提供的另一种目标确定方法的流程示意图;11 is a schematic flowchart of another target determination method provided by an embodiment of the present application;
图12为本申请提供的一种目标确定方法的应用场景的示意图;12 is a schematic diagram of an application scenario of a target determination method provided by the present application;
图13为本申请实施例提供的方案与其他方案的效果对比图;Fig. 13 is the effect comparison diagram of the scheme provided by the embodiment of the present application and other schemes;
图14为本申请实施例提供的一种模型训练方法的流程示意图;14 is a schematic flowchart of a model training method provided by an embodiment of the present application;
图15为本申请提供的一种目标确定装置的结构示意图;15 is a schematic structural diagram of a target determination device provided by the application;
图16为本申请提供的一种模型训练装置的结构示意图;16 is a schematic structural diagram of a model training device provided by the application;
图17为本申请提供的另一种目标确定装置的结构示意图;17 is a schematic structural diagram of another target determination device provided by the application;
图18为本申请实施例提供的芯片的一种结构示意图。FIG. 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.
具体实施方式detailed description
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Those of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
为了更好的理解本申请提供的方案可以适用的领域以及场景,在对本申请提供的技术方案进行具体的介绍之前,首先对多传感器信息融合的相关知识进行介绍。In order to better understand the applicable fields and scenarios of the solution provided by this application, before the specific introduction of the technical solution provided by this application, the related knowledge of multi-sensor information fusion is first introduced.
多传感器信息融合(multi-sensor information fusion,MSIF)是利用计算机技术将来自多传感器或多源的信息或数据,在一定的准则下加以自动分析和综合,以完成所需要的决策和估计而进行的信息处理过程。传感器数据融合的定义可以概括为把分布在不同位置的多个同类或不同类传感器所提供的局部数据资源加以综合,采用计算机技术对其进行分析,消除多传感器信息之间可能存在的冗余和矛盾,加以互补,降低其不确实性,获得被测目标的一致性解释与描述,从而提高系统决策、规划、反应的快速性和正确性,使系统获得更充分的信息。本申请有时也将同类传感器称为同构传感器,将不同类传感器称为异构传感器,在不强调二者的区别之时,二者表示相同的意思。此外需要说明的是,本申请有时也将多传感器信息融合称为多传感器数据融合,或者多传感器融合,在不强调他们的区别之时,他们表示相同的意思。Multi-sensor information fusion (MSIF) is to use computer technology to automatically analyze and synthesize information or data from multiple sensors or multiple sources under certain criteria to complete the required decision-making and estimation. information processing process. The definition of sensor data fusion can be summarized as synthesizing the local data resources provided by multiple sensors of the same or different types distributed in different locations, and using computer technology to analyze them to eliminate the possible redundancy and redundancy between multi-sensor information. Contradictions, complement each other, reduce their uncertainty, and obtain a consistent interpretation and description of the measured target, thereby improving the rapidity and correctness of system decision-making, planning, and response, and enabling the system to obtain more adequate information. In this application, the same type of sensor is sometimes referred to as a homogeneous sensor, and a different type of sensor is referred to as a heterogeneous sensor. When the difference between the two is not emphasized, the two have the same meaning. In addition, it should be noted that this application sometimes refers to multi-sensor information fusion as multi-sensor data fusion, or multi-sensor fusion, and when their differences are not emphasized, they mean the same thing.
传感器的信息融合可以针对不同层次的信息进行融合,比如可以包括目标级别的检测结果融合(high-level fusion),以及特征级融合(feature-level fusion)。其中,high-level fusion指从单个传感器的数据得到目标级别的检测结果后,将多个同构或异构传感器的目标级别的检测结果融合。feature-level fusion指在对单个传感器的测量数据进行特征提取后为形成目标级别的检测结果前,将多个同构或异构传感器的提取特征进行融合。下面结合图1a和图1b进行说明,图1a为目标级别的检测结果融合的流程示意图,图1b为特征级融合的流程示意图。如图1a所示,假设有多个传感器,分别是第一传感器,第二传感器和第三传感器。通过第一感知算法对第一传感器获取的数据进行处理,以输出目标的第一目标级检测结果。通过第二感知算法对第二传感器获取的数据进行处理,以输出该目标的第二目标级检测结果。通过第三感知算法对第三传感器获取的数据进行处理,以输出该目标的第三目标级检测结果。再对第一目标级检测结果,第二目标级检测结果以及第三目标级检测结果进行融合处理。对于目标级别的检测结果的融合,每个传感器各自独立处理生成目标数据。每个传感器都有自己独立的感知,比如激光雷达有激光雷达的感知,摄像头有摄像头的感知,毫米波雷达也会做出自己的感知。当所有传感器完成目标数据生成后,再由主处理器进行数据融合。如图1b所示,假设有多个传感器,分别是第一传感器,第二传感器和第三传感器。在特征级融合的场景中,只有一个感知的算法,对融合后的多维综合数据进行感知。由于只有一个感知的算法,需要对每一个传感器获取的数据进行时间的同步以及空间的同步。其中,时间的同步是为了保证不同传感器数据采集的数据在时间上是同步的,空间的同步是为了使不同传感器基于各自坐标系下的测量值转换到同一个坐标系上去,即,坐标系的统一。Sensor information fusion can be used for information fusion at different levels, such as target-level detection result fusion (high-level fusion) and feature-level fusion (feature-level fusion). Among them, high-level fusion refers to the fusion of target-level detection results of multiple homogeneous or heterogeneous sensors after obtaining target-level detection results from the data of a single sensor. Feature-level fusion refers to the fusion of the extracted features of multiple homogeneous or heterogeneous sensors after the feature extraction of the measurement data of a single sensor to form the target-level detection result. 1a and FIG. 1b are described below. FIG. 1a is a schematic flowchart of target-level detection result fusion, and FIG. 1b is a schematic flowchart of feature-level fusion. As shown in Figure 1a, it is assumed that there are multiple sensors, namely the first sensor, the second sensor and the third sensor. The data acquired by the first sensor is processed by the first perception algorithm to output the first target-level detection result of the target. The data acquired by the second sensor is processed by the second perception algorithm to output the second target-level detection result of the target. The data acquired by the third sensor is processed by the third perception algorithm to output the third target-level detection result of the target. The first target level detection result, the second target level detection result and the third target level detection result are then fused. For the fusion of object-level detection results, each sensor independently processes the generated object data. Each sensor has its own independent perception. For example, lidar has the perception of lidar, camera has the perception of camera, and millimeter-wave radar will also make its own perception. After all sensors complete the target data generation, the main processor performs data fusion. As shown in Fig. 1b, it is assumed that there are multiple sensors, namely the first sensor, the second sensor and the third sensor. In the feature-level fusion scenario, there is only one perception algorithm that perceives the fused multi-dimensional comprehensive data. Since there is only one perception algorithm, the data acquired by each sensor needs to be synchronized in time and space. Among them, the synchronization of time is to ensure that the data collected by different sensors are synchronized in time, and the synchronization of space is to convert the measurement values of different sensors to the same coordinate system based on their respective coordinate systems, that is, the coordinate system Unite.
多传感器数据融合虽然未形成完整的理论体系和有效的融合算法,但在不少应用领域根据各自的具体应用背景,已经提出了许多成熟并且有效的融合方法。多传感器数据融合的常用方法基本上可概括为随机和人工智能两大类,随机类方法有加权平均法、卡尔曼滤波法、多贝叶斯估计法、证据推理、产生式规则等;而人工智能类则有模糊逻辑理论、神经网络、粗集理论、专家系统等。Although multi-sensor data fusion has not formed a complete theoretical system and effective fusion algorithm, many mature and effective fusion methods have been proposed in many application fields according to their specific application backgrounds. The common methods of multi-sensor data fusion can be basically summarized into two categories: random and artificial intelligence. The random methods include weighted average method, Kalman filter method, multi-Bayesian estimation method, evidence inference, production rules, etc.; The intelligent category includes fuzzy logic theory, neural network, rough set theory, expert system and so on.
需要说明的是,要准确融合多传感器信息必须要进行目标在不同传感器的一一匹配对应,即进行多传感器目标匹配。完成多传感器目标匹配后通过融合即可得到目标的准确信息。本申请也将目标的匹配称为传感器输出数据的关联,在不强调二者的区别之时,二者表示相同的意思。本申请提供的方案关注的重点在于如何保证异构传感器的目标级检测结果之间关联的正确性或准确性,以得到更好的数据融合结果,即更好的保证后续输出鲁棒的检测结果。通常认为异构传感器如果在某一个维度(或者功能)的检测性能都好,二者的关联的准确性一般较高,下面结合图2进行说明。如图2所示,为异构传感器在不同维度的检测性能的示意图。图2中展示了3种传感器,摄像机,毫米波雷达,激光雷达,以及这三种传感器在7种不同维度的检测性能,该7种不同的维度分别是目标检测,目标识别,距离测量,物体边缘检测,车道跟踪,恶劣天气下的功能以及黑暗或者曝光严重时的功能。从图2中可以看到,毫米波雷达和激光雷达在目标检测上都有很好的检测性能,则对于目标检测这个功能,毫米波雷达和激光雷达的数据关联的准确性就会较高。再比如,对于物体 边缘检测这个功能,毫米波雷达和激光雷达的数据关联的准确性就会较低。此外,从图2中可以看出,对于这7种功能,摄像机和毫米波雷达无法在某一个维度的检测性能都好,所以摄像机和毫米波雷达的数据关联的准确性通常都会较低。但同时摄像机和毫米波雷达的测量特性的优势互补效应也很好。因此如何使摄像机和毫米波雷达的检测结果进行关联,具有重要意义。It should be noted that to accurately fuse multi-sensor information, one-to-one matching of targets in different sensors must be performed, that is, multi-sensor target matching. After completing the multi-sensor target matching, the accurate information of the target can be obtained through fusion. This application also refers to the matching of the target as the association of sensor output data, and the two represent the same meaning unless the difference between the two is emphasized. The solution provided in this application focuses on how to ensure the correctness or accuracy of the correlation between the target-level detection results of heterogeneous sensors, so as to obtain better data fusion results, that is, to better ensure the subsequent output of robust detection results . It is generally believed that if the detection performance of a heterogeneous sensor in a certain dimension (or function) is good, the accuracy of the correlation between the two is generally higher, which will be described below with reference to FIG. 2 . As shown in Figure 2, it is a schematic diagram of the detection performance of heterogeneous sensors in different dimensions. Figure 2 shows 3 kinds of sensors, camera, millimeter wave radar, lidar, and the detection performance of these three kinds of sensors in 7 different dimensions, the 7 different dimensions are target detection, target recognition, distance measurement, object Edge detection, lane tracking, inclement weather and dark or heavily exposed functions. As can be seen from Figure 2, both millimeter-wave radar and lidar have good detection performance in target detection. For the function of target detection, the accuracy of data association between millimeter-wave radar and lidar will be higher. For another example, for the function of object edge detection, the accuracy of data association between millimeter-wave radar and lidar will be low. In addition, it can be seen from Figure 2 that for these seven functions, the detection performance of the camera and the millimeter-wave radar cannot be good in a certain dimension, so the accuracy of the data association between the camera and the millimeter-wave radar is usually low. But at the same time, the complementary effect of the measurement characteristics of the camera and the millimeter-wave radar is also very good. Therefore, how to correlate the detection results of cameras and millimeter-wave radars is of great significance.
本申请提供的方案需要通过神经网络对异构传感器的输出数据进行关联,以下将会涉及大量与神经网络相关的知识,为了更好的理解本申请提供的技术方案,下面对神经网络的相关知识进行介绍。需要说明的是,本申请提供的方案并不对神经网络的类型限定,任何一种可以用于目标检测的神经网络,本申请实施例均可以采用。The solution provided by this application needs to correlate the output data of heterogeneous sensors through a neural network. The following will involve a lot of knowledge related to neural networks. knowledge is introduced. It should be noted that the solution provided in this application does not limit the type of neural network, and any neural network that can be used for target detection can be used in the embodiments of this application.
由于卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元对输入其中的图像中的重叠区域作出响应。Since the convolutional neural network (CNN) is a deep neural network with a convolutional structure, it is a deep learning (deep learning) architecture. Learning at multiple levels at the level of abstraction. As a deep learning architecture, a CNN is a feed-forward artificial neural network in which each neuron responds to overlapping regions in images fed into it.
如图3所示,卷积神经网络(CNN)100可以包括输入层110,卷积层/池化层120,其中池化层为可选的,以及神经网络层130。As shown in FIG. 3 , a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
卷积层/池化层120:Convolutional layer/pooling layer 120:
卷积层:Convolutional layer:
如图3所示卷积层/池化层120可以包括如示例121-126层,在一种实现中,121层为卷积层,122层为池化层,123层为卷积层,124层为池化层,125为卷积层,126为池化层;在另一种实现方式中,121、122为卷积层,123为池化层,124、125为卷积层,126为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG. 3, the convolutional/pooling layer 120 may include layers 121-126 as examples. In one implementation, layer 121 is a convolutional layer, layer 122 is a pooling layer, layer 123 is a convolutional layer, and layer 124 is a convolutional layer. Layers are pooling layers, 125 are convolutional layers, and 126 are pooling layers; in another implementation, 121 and 122 are convolutional layers, 123 are pooling layers, 124 and 125 are convolutional layers, and 126 are pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
以卷积层121为例,卷积层121可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用维度相同的多个权重矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化……该多个权重矩阵维度相同,经过该多个维度相同的权重矩阵提取后的特征图维度也相同,再将提取到的多个维度相同的特征图合并形成卷积运算的输出。Taking the convolution layer 121 as an example, the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator can be essentially a weight matrix. This weight matrix is usually pre-defined. In the process of convolving an image, the weight matrix is usually pixel by pixel along the horizontal direction on the input image ( Or two pixels after two pixels...depending on the value of stride), which completes the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied. The output of each weight matrix is stacked to form the depth dimension of the convolutional image. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Perform fuzzification... The dimensions of the multiple weight matrices are the same, and the dimension of the feature maps extracted from the weight matrices with the same dimensions are also the same, and then the multiple extracted feature maps with the same dimensions are combined to form the output of the convolution operation .
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以从输入图像中提取信息,从而帮助卷积神经网络100进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
当卷积神经网络100有多个卷积层的时候,初始的卷积层(例如121)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络100深度的加深,越往后的卷积层(例如126)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。When the convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (for example, 121) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 100 deepens, the features extracted by the later convolutional layers (eg 126) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.
池化层:Pooling layer:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,即如图3中120所示例的121-126各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像大小相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer, that is, each layer 121-126 exemplified by 120 in Figure 3, which can be a convolutional layer followed by a layer The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. During image processing, the only purpose of pooling layers is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the average value of the pixel values in the image within a certain range. The max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image. The size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
神经网络层130:Neural network layer 130:
在经过卷积层/池化层120的处理后,卷积神经网络100还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层120只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或别的相关信息),卷积神经网络100需要利用神经网络层130来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层130中可以包括多层隐含层(如图3所示的131、132至13n)以及输出层140,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to utilize the neural network layer 130 to generate one or a set of outputs of the required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 3) and the output layer 140, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
在神经网络层130中的多层隐含层之后,也就是整个卷积神经网络100的最后层为输出层140,该输出层140具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络100的前向传播(如图3由110至140的传播为前向传播)完成,反向传播(如图3由140至110的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络100的损失及卷积神经网络100通过输出层输出的结果和理想结果之间的误差。After the multi-layer hidden layers in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 100 (as shown in Fig. 3 from 110 to 140 is forward propagation) is completed, the back propagation (as shown in Fig. 3 from 140 to 110 as back propagation) will start to update The weight values and biases of the aforementioned layers are used to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.
需要说明的是,如图3所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,如图4所示的多个卷积层/池化层并行,将分别提取的特征均输入给神经网络层130进行处理。It should be noted that the convolutional neural network 100 shown in FIG. 3 is only used as an example of a convolutional neural network. In a specific application, the convolutional neural network can also exist in the form of other network models, for example, such as The multiple convolutional layers/pooling layers shown in FIG. 4 are in parallel, and the extracted features are input to the neural network layer 130 for processing.
在一个优选的实施方式中,本申请的神经网络可以采用高效区域卷积神经网络(faster regions with convolution neural network,Faster-RCNN)。Faster RCNN目标检测算法为目标检测算法中较为典型的一种算法,在算法中对于输入的一幅图像首先使用 多层卷积层提取图像的基础特征图,基于基础特征图,利用Faster RCNN算法中的区域提议网络(region proposal net,RPN)生成大量的候选框,并对大量的候选框进行筛选和过滤,只选取固定数量的候选框输入到后一级模块中;之后对固定数量的候选框进行更深层的分类分析,最终获取包含有目标的最终候选框。需要说明的是,本申请提供的方案,并不是通过RPN网络生成大量的候选框,这将在后文进行说明。下面结合图5对Faster RCNN进行介绍,图5为一种高效区域卷积神经网络的示意图。In a preferred embodiment, the neural network of the present application may adopt an efficient regional convolutional neural network (faster regions with convolution neural network, Faster-RCNN). The Faster RCNN target detection algorithm is a typical target detection algorithm. In the algorithm, for an input image, a multi-layer convolution layer is used to extract the basic feature map of the image. Based on the basic feature map, the Faster RCNN algorithm is used in the algorithm. The region proposal network (RPN) generates a large number of candidate boxes, and filters and filters a large number of candidate boxes, and only selects a fixed number of candidate boxes and inputs them into the next-level module; Carry out a deeper classification analysis, and finally obtain the final candidate box containing the target. It should be noted that the solution provided in this application does not generate a large number of candidate frames through the RPN network, which will be described later. The Faster RCNN will be introduced below in conjunction with Figure 5, which is a schematic diagram of an efficient regional convolutional neural network.
如图5所示,Faster RCNN可以包括四个部分,分别是卷积层,RPN网络、划分池化区域层(roi pooling)、分类层和回归网络。以下分别进行说明。上文已经对卷积层进行了介绍,主要用于提取图片的特征,输入为整张图片,输出为提取出的特征,该提取出的特征一般称为特征图(feature maps)。RPN网络,用于推荐候选区域,输入为图片,输出为多个候选区域,需要说明的是,本申请提供的方案,并不是通过RPN网络输出候选区域,这将在后文进行说明。此外需要说明的是,本申请有时也将候选区域称为候选框,在不强调二者的区别之时,二者表示相同的意思。roi pooling的过程可以理解为对候选区域进行池化的过程。将原图进行特征提取的时候,就会提取到相应的第一特征图。那么相应的候选区域就会在第一特征图上有映射,这个映射过程就是roi pooling的一部分。一般还会进行最大池化(max pooling)的过程,进而得到第二特征图,送入后面继续计算,该第二特征图是候选区域对应的特征图。分类层和回归网络对第二特征图进行进一步的处理,通过分类层和回归网络输出候选区域所属的类,和候选区域在图像中的位置。As shown in Figure 5, Faster RCNN can include four parts, namely convolution layer, RPN network, partition pooling layer (roi pooling), classification layer and regression network. Each of them will be described below. The convolutional layer has been introduced above. It is mainly used to extract the features of the picture. The input is the entire picture, and the output is the extracted features. The extracted features are generally called feature maps. The RPN network is used to recommend candidate regions. The input is a picture, and the output is multiple candidate regions. It should be noted that the solution provided in this application does not output candidate regions through the RPN network, which will be described later. In addition, it should be noted that in this application, a candidate region is sometimes referred to as a candidate frame, and when the difference between the two is not emphasized, the two have the same meaning. The process of roi pooling can be understood as the process of pooling candidate regions. When feature extraction is performed on the original image, the corresponding first feature map will be extracted. Then the corresponding candidate area will be mapped on the first feature map, and this mapping process is part of roi pooling. Generally, the process of max pooling is also carried out, and then a second feature map is obtained, which is sent to continue the calculation later. The second feature map is the feature map corresponding to the candidate region. The classification layer and the regression network further process the second feature map, and output the class to which the candidate region belongs and the position of the candidate region in the image through the classification layer and the regression network.
本申请提供的方案可以包括“推理”流程和“训练”流程两部分。下面分别介绍。The solution provided in this application may include two parts, the "inference" process and the "training" process. They are introduced separately below.
一、推理过程——目标确定方法。First, the reasoning process - goal determination method.
图6为本申请实施例提供的一种目标确定方法的流程示意图。FIG. 6 is a schematic flowchart of a method for determining a target according to an embodiment of the present application.
如图6所示,本申请实施例提供的一种目标确定方法可以包括以下步骤:As shown in FIG. 6 , a target determination method provided by an embodiment of the present application may include the following steps:
601、获取待处理图像。601. Acquire an image to be processed.
本申请提供的方案可以应用于多种场景中,具体地,图6所示的方法可以应用在自动驾驶领域、监控领域等场景中。The solution provided in this application can be applied to various scenarios, specifically, the method shown in FIG. 6 can be applied to scenarios such as the field of automatic driving and the field of monitoring.
当图6所示的方法应用在自动驾驶领域的场景时,步骤601中的待处理图像可以是车辆通过视觉传感器获取的图像,具体的,待处理图像可以是车辆通过安装在车辆上的摄像机拍摄的图像。When the method shown in FIG. 6 is applied to the scene in the field of automatic driving, the image to be processed in step 601 may be an image obtained by the vehicle through a visual sensor, and specifically, the image to be processed may be captured by the vehicle through a camera installed on the vehicle Image.
当图4所示的方法应用在监控场景时,步骤601中的待处理图像可以是马路边安装的视觉传感器获取的图像,具体的,待处理图像可以是马路边安装的摄像机拍摄的图像。When the method shown in FIG. 4 is applied to a monitoring scene, the image to be processed in step 601 may be an image acquired by a visual sensor installed on the roadside, specifically, the image to be processed may be an image captured by a camera installed on the roadside.
本申请提供的方案可以通过视觉传感器获取待处理图像,需要说明的是,本申请有时也将视觉传感器称为摄像机,在不强调二者的区别之时,二者表示相同的意思。The solution provided in this application can obtain images to be processed through a visual sensor. It should be noted that this application also sometimes refers to a visual sensor as a camera. When the difference between the two is not emphasized, the two have the same meaning.
在一个可能的实施方式中,视觉传感器可以包括镜头和图像传感器。景物通过镜头生成的光学图像投射到图像传感器上,图像传感器将其转为电信号,再经过模数转换(A/D)转换等处理过程后,得到待处理图像。该视觉传感器可以是以下任意一种具体形式,例如,摄像头、摄像机、相机、扫描仪、或其他带有拍照功能的设备(例如,手机、平板电脑等)。In one possible implementation, the vision sensor may include a lens and an image sensor. The optical image generated by the lens is projected onto the image sensor, and the image sensor converts it into an electrical signal, and then through the analog-to-digital (A/D) conversion and other processing processes, the image to be processed is obtained. The visual sensor can be in any of the following specific forms, for example, a camera, a video camera, a camera, a scanner, or other devices with a camera function (for example, a mobile phone, a tablet computer, etc.).
602、获取多个毫米波探测点。602. Acquire multiple millimeter wave detection points.
多个毫米波探测点和待处理图像是同步获取的数据,每个毫米波探测点包括深度信息,深度信息用于表示检测目标与毫米波雷达的距离。该检测目标可以是车辆、人、树木等任意目标。其中,同步获取的数据可以理解为毫米波雷达和图像传感器同时采集数据,或者可以理解为毫米波雷达和图像传感器采集数据的帧率的偏差在预设范围内。比如,针对相同的检测目标,毫米波雷达按照第一帧率采集毫米波探测点,图像传感器按照第二帧率采集待处理图像,第一帧率和第二帧率的偏差小于预设阈值即可以认为毫米波雷达和待处理图像是同步获取的数据。毫米波雷达发射高频毫米波,经过目标反射后被接收系统收集,通过测频来确定目标的距离,从而形成多个毫米波探测点。The multiple millimeter-wave detection points and the images to be processed are data obtained synchronously. Each millimeter-wave detection point includes depth information, and the depth information is used to indicate the distance between the detection target and the millimeter-wave radar. The detection target can be any target such as vehicles, people, trees, etc. The data acquired synchronously can be understood as the millimeter-wave radar and the image sensor simultaneously collect data, or it can be understood as the deviation of the frame rate of the data collected by the millimeter-wave radar and the image sensor within a preset range. For example, for the same detection target, the millimeter-wave radar collects the millimeter-wave detection points according to the first frame rate, and the image sensor collects the image to be processed according to the second frame rate, and the deviation between the first frame rate and the second frame rate is less than the preset threshold, namely It can be considered that the millimeter-wave radar and the image to be processed are data acquired synchronously. The millimeter-wave radar emits high-frequency millimeter waves, which are collected by the receiving system after being reflected by the target, and the distance to the target is determined by frequency measurement, thereby forming multiple millimeter-wave detection points.
当图6所示的方法应用在自动驾驶领域的场景时,步骤602中的毫米波探测点可以是车辆上安装的毫米波雷达获取的数据。When the method shown in FIG. 6 is applied to the scene in the field of automatic driving, the millimeter wave detection point in step 602 may be the data obtained by the millimeter wave radar installed on the vehicle.
当图6所示的方法应用在监控场景中时,步骤602中的毫米波探测点可以是马路上安装的监控设备上的毫米波雷达获取的数据。When the method shown in FIG. 6 is applied in a monitoring scenario, the millimeter-wave detection point in step 602 may be data obtained by a millimeter-wave radar on a monitoring device installed on the road.
本申请中有时也将深度信息称为距离信息,在不强调二者的区别之时二者均表示毫米波雷达获取的目标与毫米波雷达之间的距离。In this application, depth information is also sometimes referred to as distance information, and both represent the distance between the target acquired by the millimeter-wave radar and the millimeter-wave radar when the difference between the two is not emphasized.
603、将多个毫米波探测点映射到待处理图像上。603. Map the multiple millimeter wave detection points to the image to be processed.
本申请可以通过多种方式实现将多个毫米波探测点映射到待处理图像上,示例性的,下面给出一种将多个毫米波探测点映射到待处理图像上的方式。需要说明的是,相关技术中可以将多个毫米波探测点映射到待处理图像上的方式本申请实施例均可以采用。The present application can implement the mapping of multiple millimeter wave detection points to the image to be processed in various ways. By way of example, a method of mapping multiple millimeter wave detection points to the image to be processed is given below. It should be noted that the embodiments of the present application can all be used in the related art in which a plurality of millimeter wave detection points can be mapped onto the image to be processed.
将多个毫米波探测点映射到待处理图像上,即将毫米波雷达与视觉传感器的数据进行的空间融合,具体的,可以通过坐标系的统一,实现将毫米波雷达获取的毫米波探测点映射到视觉传感器获取的待处理图像上。毫米波雷达确定的毫米波探测点和视觉传感器确定的目标必须在同一坐标系下才能进行更好的关联和匹配。Mapping multiple millimeter-wave detection points onto the image to be processed, that is, spatial fusion of the millimeter-wave radar and visual sensor data. Specifically, the millimeter-wave detection points acquired by the millimeter-wave radar can be mapped through the unification of the coordinate system. to the image to be processed acquired by the vision sensor. The millimeter-wave detection point determined by the millimeter-wave radar and the target determined by the vision sensor must be in the same coordinate system for better correlation and matching.
假设视觉传感器坐标系为(Xc,Yc,Zc),毫米波雷达坐标系为(Xr,Yr,Zr),三维世界坐标系为(Xw,Yw,Zw)。Assume that the visual sensor coordinate system is (Xc, Yc, Zc), the millimeter-wave radar coordinate system is (Xr, Yr, Zr), and the three-dimensional world coordinate system is (Xw, Yw, Zw).
可以以毫米波雷达所在坐标系为基准,设定毫米波雷达所在坐标系与世界坐标系重合,可以通过如下公式表示:The coordinate system where the millimeter-wave radar is located can be used as the benchmark to set the coordinate system where the millimeter-wave radar is located to coincide with the world coordinate system, which can be expressed by the following formula:
Figure PCTCN2021094781-appb-000001
Figure PCTCN2021094781-appb-000001
将视觉传感器坐标系下的图像数据映射至世界坐标系下,得到视觉传感器坐标系下的图像数据在世界坐标系下的坐标,可以通过以下公式表示:Map the image data in the vision sensor coordinate system to the world coordinate system to obtain the coordinates of the image data in the vision sensor coordinate system in the world coordinate system, which can be expressed by the following formula:
Figure PCTCN2021094781-appb-000002
Figure PCTCN2021094781-appb-000002
f表示视觉传感器的焦距,(u0,v0)表示视觉传感器的主点,dx,dy分别表示视觉传感 器在x和y方向上的像素单元大小,[-a,-b,0] T表示视觉传感器与毫米波雷达的安装位置之间的平移向量,θ表示毫米波雷达与视觉传感器之间的转角。根据上述公式1-1和公式1-2可以将毫米波雷达的坐标转换为视觉传感器的坐标,实现将毫米波雷达探测点映射到待处理图像上。 f represents the focal length of the visual sensor, (u0, v0) represents the principal point of the visual sensor, dx, dy represent the pixel unit size of the visual sensor in the x and y directions, respectively, [-a,-b,0] T represents the visual sensor The translation vector between the millimeter-wave radar and the installation position of the millimeter-wave radar, and θ represents the rotation angle between the millimeter-wave radar and the vision sensor. According to the above formula 1-1 and formula 1-2, the coordinates of the millimeter-wave radar can be converted into the coordinates of the vision sensor, and the detection points of the millimeter-wave radar can be mapped to the image to be processed.
604、根据每个毫米波探测点的深度信息和位置信息确定待处理图像的多个候选框。604. Determine multiple candidate frames of the image to be processed according to the depth information and position information of each millimeter wave detection point.
位置信息用于表示每个毫米波探测点映射到待处理图像的位置。The location information is used to indicate where each mmWave detection point is mapped to the image to be processed.
根据每个毫米波探测点的深度信息和位置信息都可以确定一组候选框,该一组候选框中包括多个候选框。下面分别对如何根据深度信息和位置信息确定候选框进行说明。A set of candidate frames can be determined according to the depth information and position information of each millimeter wave detection point, and the set of candidate frames includes multiple candidate frames. The following describes how to determine the candidate frame according to the depth information and the position information, respectively.
本申请提供的方案中,候选框的尺寸根据毫米波探测点的深度信息确定。本申请提供的方案运用了小孔成像的原理,即物距越近,像越大,物距越远,像越小。根据小孔成像的原理,毫米波探测点的深度信息越大,候选框的尺寸越小,毫米波探测点的深度信息越小,候选框的尺寸越大。此外,本申请提供的方案可以设置先验框。可以设置多个先验框,具体的,可以设置多个尺寸或者长宽比不同的区域作为先验框,候选框是以这些先验框为基准的,在一定程度上减少训练难度。其中,先验框的尺寸可以根据预设的类别的尺寸确定。比如,本申请提供的方案可以识别3个类别,分别是卡车,轿车以及公交车,可以通过大量的统计数据获得卡车的尺寸的平均值,轿车的尺寸的平均值以及公交车尺寸的平均值。则对于每一个毫米波探测点,可以至少确定3种尺寸的先验框,再根据深度信息确定候选框的尺寸时,可以根据该毫米波探测点的深度信息对3种尺寸的每一种尺寸的先验框的尺寸进行调节。In the solution provided in this application, the size of the candidate frame is determined according to the depth information of the millimeter wave detection point. The solution provided in this application uses the principle of pinhole imaging, that is, the closer the object distance is, the larger the image will be, and the farther the object distance will be, the smaller the image will be. According to the principle of pinhole imaging, the larger the depth information of the millimeter wave detection point, the smaller the size of the candidate frame, and the smaller the depth information of the millimeter wave detection point, the larger the size of the candidate frame. In addition, the solution provided in this application can set a priori box. Multiple a priori frames can be set. Specifically, multiple areas with different sizes or aspect ratios can be set as a priori frames. The candidate frame is based on these a priori frames, which reduces the difficulty of training to a certain extent. The size of the prior frame may be determined according to the size of the preset category. For example, the solution provided in this application can identify three categories, namely trucks, cars and buses, and can obtain the average size of trucks, the average size of cars and the average size of buses through a large number of statistical data. Then for each millimeter-wave detection point, at least three sizes of a priori frames can be determined, and then when the size of the candidate frame is determined according to the depth information, each of the three sizes can be determined according to the depth information of the millimeter-wave detection point. The size of the prior box is adjusted.
本申请提供的方案中,根据毫米波探测点映射到待处理图像上的位置确定候选框的位置。换句话说,根据每个毫米波探测点在待处理图像上的位置确定候选框的位置。本申请提供的方案根据毫米波探测点的分布特性以及毫米波探测点在待处理图像上的位置确定候选框的位置。其中,毫米波探测点的分布特性在不同的场景中可能呈现不同的分布特性。在一个可能的实施方式中,可以针对每一种可能的应用场景,经过大量的实验统计获得毫米波探测点在某一种应用场景中的分布特性,下面举例说明。假设本申请提供的方案应用在自动驾驶领域,要可以获取毫米波探测点在车辆上的分布特性。比如,可以在干净的背景环境中放置车辆(干净的背景环境可以理解为场景中除了该车辆,尽量减少周围其他目标),毫米波雷达发射高频毫米波,经过该车辆反射后被接收系统收集,以获取一次统计数据。通过毫米波雷多次发射高频毫米波,针对该车辆统计多次数据,或者可以更换不同的车辆,或者可以增加不同数量的车辆,进行多次统计,以获取毫米波探测点在车辆上的分布特性。再比如,还可以针对不同场景的应用需求,获取毫米波探测点在人身上的分布特性,或者还可以获取毫米波探测点在动物身上的分布特性,或者还可以获取毫米波探测点在货物(比如装运箱)上的分布特性。In the solution provided in this application, the position of the candidate frame is determined according to the position of the millimeter wave detection point mapped to the image to be processed. In other words, the position of the candidate frame is determined according to the position of each millimeter wave detection point on the image to be processed. The solution provided in this application determines the position of the candidate frame according to the distribution characteristics of the millimeter wave detection points and the positions of the millimeter wave detection points on the image to be processed. Among them, the distribution characteristics of millimeter wave detection points may present different distribution characteristics in different scenarios. In a possible implementation manner, for each possible application scenario, the distribution characteristics of the millimeter wave detection points in a certain application scenario can be obtained through a large number of experimental statistics, which are described below with an example. Assuming that the solution provided in this application is applied in the field of automatic driving, it is necessary to obtain the distribution characteristics of millimeter wave detection points on the vehicle. For example, a vehicle can be placed in a clean background environment (a clean background environment can be understood as in addition to the vehicle in the scene, minimizing other objects around), the millimeter-wave radar emits high-frequency millimeter waves, which are reflected by the vehicle and collected by the receiving system , to get one statistic. The high-frequency millimeter waves are transmitted multiple times through the millimeter-wave mine, and multiple data are counted for the vehicle, or different vehicles can be replaced, or different numbers of vehicles can be added, and multiple statistics can be performed to obtain the millimeter-wave detection points on the vehicle. distribution characteristics. For another example, according to the application requirements of different scenarios, the distribution characteristics of millimeter wave detection points on people can be obtained, or the distribution characteristics of millimeter wave detection points on animals can also be obtained, or the distribution characteristics of millimeter wave detection points on goods ( such as the distribution characteristics on the shipping box).
为了更好的理解本申请提供的方案如何根据深度信息和位置信息确定候选框,下面以自动驾驶场景为例,进行举例说明。In order to better understand how the solution provided in this application determines the candidate frame according to the depth information and the position information, the following takes an automatic driving scenario as an example to illustrate.
图7a至7c,为本申请提供的一种目标确定方法的应用场景的示意图。如图7a所示,为检测目标为车辆时,获取到的毫米波探测点的示意图。每个毫米波探测点都包含目标和毫 米波雷达之间的距离,这些点中有些点是因为多径反射或者光线追踪(ray tracing)产生的噪点,但这些点同样也包含距离信息。如图7b所示,以图7a中的一个毫米波探测点为例对如何根据位置信息确定候选框进行说明。需要说明的是,根据每一个毫米波探测点的位置信息确定候选框的过程的原理相同,对此不再一一重复说明。如图7b所示,假设根据毫米波探测点在车辆上的分布特性确定车辆和毫米波探测点的关系为,毫米波探测点一般位于目标车辆的左下角,则可以以毫米波探测点在先验框的左下角确定多个先验框,其中多个先验框的数目根据预先规定的目标的类别确定。如图7b所示,假设预先规定有3个类别,分别是第一类别、第二类别以及第三类别,通过大量的统计数据获得第一类别的尺寸的平均值为第一尺寸,第二类别的尺寸的平均值为第二尺寸,第三类别的尺寸的平均值为第三尺寸。则针对图7b所示的毫米波探测点,一共可以获取3种不同尺寸的先验框。再比如7c所示,假设根据毫米波探测点在车辆上的分布特性确定车辆和毫米波探测点的关系为,毫米波探测点一般位于目标车辆的下方,则可以以毫米波探测点在先验框的下边确定多个先验框。在一个可能的实施方式中,可以以毫米波探测点在先验框的下边的中心位置确定多个先验框,关于先验框的数目和尺寸可以按照图7a中描述进行理解,此处不再重复赘述。在一个可能的实施方式中,假设根据毫米波探测点在车辆上的分布特性确定车辆和毫米波探测点的关系为,毫米波探测点一般位于目标车辆的下方,则如图7d所示,可以以毫米波探测点在先验框的下边的中间位置确定多个先验框。7a to 7c are schematic diagrams of application scenarios of a target determination method provided by the present application. As shown in Figure 7a, it is a schematic diagram of the acquired millimeter wave detection points when the detection target is a vehicle. Each millimeter-wave detection point contains the distance between the target and the millimeter-wave radar. Some of these points are due to noise generated by multipath reflection or ray tracing, but these points also contain distance information. As shown in FIG. 7b, how to determine the candidate frame according to the position information is described by taking a millimeter wave detection point in FIG. 7a as an example. It should be noted that the principle of the process of determining the candidate frame according to the position information of each millimeter wave detection point is the same, and the description thereof will not be repeated one by one. As shown in Figure 7b, it is assumed that the relationship between the vehicle and the millimeter-wave detection points is determined according to the distribution characteristics of the millimeter-wave detection points on the vehicle, and the millimeter-wave detection point is generally located in the lower left corner of the target vehicle. The lower left corner of the inspection frame determines a plurality of a priori frames, wherein the number of the plurality of a priori frames is determined according to the category of the pre-specified target. As shown in Figure 7b, it is assumed that there are three categories in advance, namely the first category, the second category and the third category, and the average size of the first category is obtained through a large number of statistical data as the first size, the second category The average of the sizes is the second size, and the average of the sizes of the third category is the third size. Then, for the millimeter wave detection point shown in Fig. 7b, a total of three prior frames of different sizes can be obtained. Another example is shown in 7c. Assuming that the relationship between the vehicle and the millimeter-wave detection point is determined according to the distribution characteristics of the millimeter-wave detection point on the vehicle, the millimeter-wave detection point is generally located below the target vehicle, and the millimeter-wave detection point can be a priori. The bottom of the box determines a number of prior boxes. In a possible implementation manner, a plurality of a priori frames may be determined by the central position of the millimeter wave detection point at the lower side of the a priori frame. The number and size of the a priori frames can be understood according to the description in FIG. 7a. Repeat the description again. In a possible implementation, it is assumed that the relationship between the vehicle and the millimeter-wave detection points is determined according to the distribution characteristics of the millimeter-wave detection points on the vehicle, and the millimeter-wave detection points are generally located below the target vehicle, then as shown in Figure 7d, it is possible to A plurality of a priori frames are determined at the middle positions of the millimeter wave detection points below the prior frame.
以毫米波探测点在先验框的左下角确定多个先验框。从图7c至7d可以看出,本申请提供的方案可以根据毫米波探测点在某一个目标上的分布特性,比如在车辆上的分布特性确定多个先验框。需要说明的是,图7a至图7d中的以毫米波探测点在先验框的左下边确定多个先验框,以及以毫米波探测点在先验框的左下角确定多个先验框只是作为本申请提供的方案的优选方案。在一些可能的实施方式中,当然还可以根据毫米波探测点在目标上的分布特性选择其他确定先验框的方式,比如还可以根据毫米波探测点在先验框的中心位置确定多个候选框,或者还可以根据毫米波探测点在先验框的左边的任意位置上确定多个候选框。本申请根据毫米波探测点在目标上的分布特性确定候选框的位置,可以更好的选中目标,换句话说,可以更好的将毫米波探测点和目标所在的位置进行关联。A number of a priori boxes are determined at the lower left corner of the prior box with the millimeter wave detection point. It can be seen from Figures 7c to 7d that the solution provided by the present application can determine multiple prior frames according to the distribution characteristics of the millimeter wave detection points on a certain target, such as the distribution characteristics on a vehicle. It should be noted that, in FIGS. 7a to 7d , multiple a priori frames are determined with the millimeter wave detection point at the lower left of the prior frame, and multiple a priori frames are determined with the millimeter wave detection point at the lower left corner of the prior frame. It is only a preferred solution of the solution provided in this application. In some possible implementations, other methods of determining the a priori frame may also be selected according to the distribution characteristics of the millimeter wave detection points on the target. frame, or a plurality of candidate frames can be determined at any position on the left of the prior frame according to the millimeter wave detection point. The present application determines the position of the candidate frame according to the distribution characteristics of the millimeter wave detection points on the target, so that the target can be better selected, in other words, the millimeter wave detection point can be better associated with the position of the target.
在一个可能的实现方式中,毫米波雷达安装在自动驾驶汽车的前保险杠位置时,本申请通过大量实验获得毫米波探测点大多分布在车辆的底部和侧面。In a possible implementation, when the millimeter-wave radar is installed at the position of the front bumper of an autonomous vehicle, the application obtains through a large number of experiments that the millimeter-wave detection points are mostly distributed on the bottom and sides of the vehicle.
如图7e所示,为本申请实施例提供的一种应用场景的示意图。如图7e所示,以两个毫米波探测点为例,对根据毫米波探测点的深度信息确定候选框进行说明。如图7e所示,假设毫米波探测点A的深度信息小于毫米波探测点B的深度信息,则根据毫米波探测点A确定的候选框的尺寸应小于根据毫米波探测点B确定的候选框的尺寸。关于深度信息和候选框的尺寸负相关,可以参照小孔成像的原理进行理解,本申请对此不再说明。As shown in FIG. 7e , it is a schematic diagram of an application scenario provided by an embodiment of the present application. As shown in Fig. 7e, taking two millimeter-wave detection points as an example, the determination of the candidate frame according to the depth information of the millimeter-wave detection points is described. As shown in Figure 7e, assuming that the depth information of millimeter-wave detection point A is smaller than that of millimeter-wave detection point B, the size of the candidate frame determined according to millimeter-wave detection point A should be smaller than the size of the candidate frame determined according to millimeter-wave detection point B size of. Regarding the negative correlation between the depth information and the size of the candidate frame, it can be understood with reference to the principle of pinhole imaging, which will not be described in this application.
605、根据深度信息对多个候选框进行非极大值抑制(non-maximun suppression,NMS)处理,以输出目标框以及该目标框中的检测目标对应的目标毫米波探测点。605. Perform non-maximun suppression (NMS) processing on the plurality of candidate frames according to the depth information, so as to output the target frame and the target millimeter wave detection point corresponding to the detection target in the target frame.
换句话说,根据深度信息对多个候选框进行NMS处理,以输出目标框和目标毫米波探测点,目标框是根据目标毫米波探测点的深度信息和位置信息确定的。本申请以一个目标为 例进行说明,但是需要说明的是,当目标有多个时,本申请提供的方案同样适用。In other words, NMS processing is performed on multiple candidate frames according to the depth information to output the target frame and the target millimeter wave detection point, and the target frame is determined according to the depth information and position information of the target millimeter wave detection point. This application takes one target as an example for description, but it should be noted that when there are multiple targets, the solutions provided in this application are also applicable.
非极大值抑制就是抑制不是极大值的元素。该方法主要是为了降低候选框数量,本方案在步骤604中根据毫米波探测点的深度信息和位置信息可以确定出大量的候选框,每个候选框经过分类器会有一个属于某个类别的概率值,每个候选框同时会对应一个毫米波探测点的深度值。可以通过NMS方法来去掉多余的候选框,确定最终的候选框。需要说明的是,本申请有时也将最终的候选框称为目标框,在不强调二者区别的时候,二者均表示经过NMS方法处理后输出的框,该框用来表示目标所在位置。Non-maximum suppression is the suppression of elements that are not maximal. The main purpose of this method is to reduce the number of candidate frames. In step 604, a large number of candidate frames can be determined according to the depth information and position information of the millimeter wave detection points. Probability value, each candidate frame will also correspond to the depth value of a millimeter wave detection point. The redundant candidate frame can be removed by the NMS method to determine the final candidate frame. It should be noted that this application sometimes refers to the final candidate frame as the target frame. When the difference between the two is not emphasized, both of them represent the frame output after being processed by the NMS method, and the frame is used to represent the location of the target.
NMS的输入是N个已经按照得分从高到低排好序的候选框,当存在多个目标时,会输出多个候选框,比如会输出M个得分最高且未被抑制的候选框,其中N是大于M的正整数,其中候选框的得分根据深度信息确定。比如,假设目标包括3类,分别是第一类别,第二类别以及第三类别。假设经过大量统计或者通过神经网络学习等方式确定第一类别对应的目标的尺寸与深度信息之间的概率分布(假设为A概率分布),第二类别对应的目标的尺寸与深度信息之间的概率分布(假设为B概率分布),第三类别对应的目标的尺寸与深度信息之间的概率分布(假设为C概率分布),则根据候选框中的目标的深度信息与A概率分布、B概率分布以及C概率分布之间的关系可以确定候选框中的目标属于某一个类别的概率。The input of NMS is N candidate boxes that have been sorted according to the score from high to low. When there are multiple targets, multiple candidate boxes will be output. For example, M candidate boxes with the highest score and unsuppressed will be output, among which N is a positive integer greater than M, where the score of the candidate box is determined according to the depth information. For example, suppose the target includes 3 categories, namely the first category, the second category and the third category. It is assumed that the probability distribution between the size of the target corresponding to the first category and the depth information is determined through a large number of statistics or through neural network learning (assumed to be the A probability distribution), and the size of the target corresponding to the second category and the depth information. The probability distribution (assumed to be the B probability distribution), the probability distribution between the size of the target corresponding to the third category and the depth information (assumed to be the C probability distribution), then according to the depth information of the target in the candidate frame and the A probability distribution, B The relationship between the probability distribution and the C probability distribution can determine the probability that the target in the candidate box belongs to a certain category.
为了更好的理解本申请提供的方案,下面对根据深度信息对多个候选框进行NMS处理进行举例说明。先假设有6个候选框,针对每一个类别,根据深度信息对每一个候选框属于该类别的概率进行排序处理,假设,针对某个类别,从小到大分别属于该类别的概率分别为A<B<C<D<E<F。则从最大概率候选框F开始,分别判断候选框A、B、C、D、E与F的交并比(intersection over union,IOU)是否大于某个设定的阈值,IOU可以用来表示两个候选框的重叠度。假设候选框B、D与F的重叠度超过阈值,那么就舍弃候选框B、D,并标记第一个候选框F,是保留下来的。从剩下的候选框A、C、E中,选择概率最大的候选框E,然后判断候选框A、C与E的重叠度,重叠度大于一定的阈值,那么就舍弃;并标记候选框E是我们保留下来的第二个候选框。重复这个过程,找到所有被保留下来的候选框,即为最终的候选框。并输出该最终的候选框中的检测目标对应的毫米波探测点。In order to better understand the solution provided by this application, an example is given below for performing NMS processing on multiple candidate frames according to depth information. Assume that there are 6 candidate frames. For each category, the probability of each candidate frame belonging to the category is sorted according to the depth information. Suppose, for a certain category, the probability of belonging to the category from small to large is A< B<C<D<E<F. Then start from the maximum probability candidate frame F, and judge whether the intersection over union (IOU) of candidate frames A, B, C, D, E and F is greater than a certain threshold. IOU can be used to represent two The degree of overlap of the candidate boxes. Assuming that the overlapping degree of candidate frames B, D and F exceeds the threshold, then the candidate frames B and D are discarded, and the first candidate frame F is marked, which is retained. From the remaining candidate frames A, C, and E, select the candidate frame E with the highest probability, and then judge the degree of overlap between the candidate frames A, C, and E. If the degree of overlap is greater than a certain threshold, then discard it; and mark the candidate frame E. is the second candidate box we keep. Repeat this process to find all the remaining candidate boxes, which are the final candidate boxes. And output the millimeter wave detection point corresponding to the detection target in the final candidate frame.
由图6对应的实施例可知,通过毫米波探测点的位置信息和深度信息确定待处理图像的多个候选框,并根据深度信息对多个候选框进行NMS处理,当确定最终的候选框时,可以输出与该候选框关联的毫米波探测点,提升目标匹配的精确度。It can be seen from the embodiment corresponding to FIG. 6 that multiple candidate frames of the image to be processed are determined by the position information and depth information of the millimeter wave detection points, and NMS processing is performed on the multiple candidate frames according to the depth information. When the final candidate frame is determined. , the millimeter wave detection points associated with the candidate frame can be output to improve the accuracy of target matching.
由图6对应的实施例可知,可以根据深度信息对多个候选框进行NMS处理,在一些可能的实施方式中,还可以根据深度信息结合其他信息对多个候选框进行NMS处理。此外,如何确定深度信息与某一个类别之间的概率分布也可以有多种方式,下面在图6对应的实施例的基础上,对图6对应的实施例进行进一步的细化或者扩展。It can be seen from the embodiment corresponding to FIG. 6 that NMS processing may be performed on multiple candidate frames according to depth information, and in some possible implementations, NMS processing may also be performed on multiple candidate frames according to depth information combined with other information. In addition, there may also be various ways to determine the probability distribution between the depth information and a certain category. Based on the embodiment corresponding to FIG. 6 , the embodiment corresponding to FIG. 6 is further refined or expanded below.
图8为本申请实施例提供的另一种目标确定方法的流程示意图。FIG. 8 is a schematic flowchart of another target determination method provided by an embodiment of the present application.
如图8所示,本申请实施例提供的另一种目标确定方法可以包括以下步骤:As shown in FIG. 8 , another target determination method provided by this embodiment of the present application may include the following steps:
801、获取待处理图像。801. Acquire an image to be processed.
802、获取多个毫米波探测点。802. Acquire multiple millimeter wave detection points.
803、将多个毫米波探测点映射到待处理图像上。803. Map the multiple millimeter wave detection points to the image to be processed.
804、根据每个毫米波探测点的深度信息和位置信息确定待处理图像的多个候选框。804. Determine multiple candidate frames of the image to be processed according to the depth information and position information of each millimeter wave detection point.
步骤801至步骤804可以参照图6对应的实施例中的步骤601至步骤604进行理解,此处不再重复赘述。 Steps 801 to 804 can be understood with reference to steps 601 to 604 in the embodiment corresponding to FIG. 6 , and details are not repeated here.
805、根据第一分数和第二分数对多个候选框进行NMS处理。805. Perform NMS processing on the multiple candidate frames according to the first score and the second score.
第一分数表示根据分类器确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率,N个类别为预先设定的类别,N为正整数。第二分数表示根据深度信息与每个类别之间的第一概率分布确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率。The first score represents the probability determined by the classifier that the detection target in each candidate frame belongs to each of the N categories, where the N categories are preset categories, and N is a positive integer. The second score represents the probability that the detection target in each candidate frame belongs to each of the N categories, determined according to the depth information and the first probability distribution between each category.
在一个可能的实施方式中,NMS的输入是N个已经按照得分从高到低排好序的候选框,输出M个得分最高且未被抑制的候选框,其中N是大于M的正整数,其中候选框的得分根据第一分数和第二分数的乘积确定。In a possible implementation, the input of the NMS is N candidate boxes that have been sorted from high to low score, and the output M candidate boxes with the highest score and not suppressed, where N is a positive integer greater than M, The score of the candidate frame is determined according to the product of the first score and the second score.
本申请实施例并不对分类器的种类进行限定,分类器会对输入的每一个候选框进行评分,评分越高,则该候选框中有对应类别的目标的概率越大。相关技术中关于根据分类器确定每个候选框的评分,本申请实施例均可以采用。针对回归网络处理后的候选框,如果只根据第一分数对多个候选框进行NMS处理,可能会出现如图9a所示的情况,即结果中存在大量的重复和干扰。本方法通过引入毫米波探测点的深度信息,增加一个维度判断候选区域对应的毫米波探测点,提高关联的准确率,同时也可以提升目标检测的准确率。如图9a中所示,假设根据第一分数对候选框由高到底进行排序,则可能根据第一NMS处理后,可能与最终输出的候选框关联的有3个毫米波探测点。通过对该3个毫米波探测点的深度信息进行比较,假设确定A毫米波探测点的深度信息对应该候选框对应的类别的概率最大,则经过NMS处理后,如图9b所示,输出最终候选框以及B毫米波探测点。本段的描述是为了方便理解引入深度信息后,可以提升关联的准确率。在一个可能的实施方式中,每个候选框的得分可以根据下述公式确定,即每个候选框的得分可以根据第一分数和第二分数确定。NMS的输入是根据第一分数和第二分数确定的分数从高到低排好序的N个候选框,输出M个得分最高且未被抑制的最终候选框。每个候选框的得分可以通过如下公式表示:The embodiment of the present application does not limit the type of the classifier. The classifier will score each input candidate box. The higher the score, the higher the probability that there is a target of the corresponding category in the candidate box. Regarding the determination of the score of each candidate frame according to the classifier in the related art, the embodiments of the present application can all be adopted. For the candidate frames processed by the regression network, if NMS processing is performed on multiple candidate frames only according to the first score, the situation shown in Figure 9a may occur, that is, there are a large number of repetitions and interferences in the results. By introducing the depth information of the millimeter wave detection point, the method adds a dimension to determine the millimeter wave detection point corresponding to the candidate area, so as to improve the accuracy of the association and also improve the accuracy of the target detection. As shown in FIG. 9a , assuming that the candidate frames are sorted from high to low according to the first score, there may be 3 millimeter wave detection points associated with the final output candidate frame after processing according to the first NMS. By comparing the depth information of the three millimeter-wave detection points, it is assumed that the depth information of the millimeter-wave detection point A has the highest probability of corresponding to the category corresponding to the candidate frame, then after NMS processing, as shown in Figure 9b, the final output Candidate boxes and B mmWave detection points. The description in this paragraph is for the convenience of understanding that after the introduction of depth information, the accuracy of the association can be improved. In a possible implementation, the score of each candidate frame may be determined according to the following formula, that is, the score of each candidate frame may be determined according to the first score and the second score. The input of NMS is N candidate boxes sorted from high to low according to the scores determined by the first score and the second score, and the M final candidate boxes with the highest scores and no suppression are output. The score of each candidate box can be expressed by the following formula:
score=p(depth)=∑ classesp(depth,classes)=∑ classesp(depth|classs)p(class) score=p(depth)=∑ classes p(depth, classes)=∑ classes p(depth|classes)p(class)
p(class)=softmax(calsses)p(class)=softmax(calsses)
p(depth|class)~N(mean height(class),std height(class)) p(depth|class)~N(mean height(class) ,std height(class) )
其中,score代表根据第一分数和第二分数确定的分数,depth代表每个毫米波探测点的深度信息,classes代表目标的种类,种类的数量为预先设定,前文已经对此进行说明,这里不再重复赘述。p(A,B)表示A和B同时发生的概率,即深度信息和类别之间的概率分布,p(A|B)表示在B发生的概率下,A发生的概率,即某一个类别对应的深度信息的概率分布,mean表示求平均值,std表示求标准差。N代表高斯分布。Among them, score represents the score determined according to the first score and the second score, depth represents the depth information of each millimeter wave detection point, classes represents the type of the target, and the number of types is preset, which has been explained above, here It will not be repeated. p(A, B) represents the probability of A and B occurring at the same time, that is, the probability distribution between depth information and categories, p(A|B) represents the probability of A occurring under the probability of B occurring, that is, a category corresponds to The probability distribution of the depth information, mean represents the mean value, and std represents the standard deviation. N stands for Gaussian distribution.
由图8对应的实施例可知,本申请提供的方案通过第一分数和第二分数对多个候选框进行NMS处理,提升数据关联的准确率,即提升同一个目标在不同传感器的一一匹配对应的准 确率。It can be seen from the embodiment corresponding to FIG. 8 that the solution provided by this application performs NMS processing on multiple candidate frames through the first score and the second score, so as to improve the accuracy of data association, that is, to improve the one-to-one matching of the same target in different sensors. corresponding accuracy.
下面在图6和图8对应的实施例的基础上对如何确定深度信息与某一个类别之间的概率分布进行说明。The following describes how to determine the probability distribution between the depth information and a certain category on the basis of the embodiments corresponding to FIG. 6 and FIG. 8 .
在一个可能的实施方式中,对第一集合中的数据进行统计,确定每个类别对应的统计目标的第一尺寸的概率分布,第一集合包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第一尺寸的概率分布以及第一关系确定第一概率分布,第一关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。举例说明,假设第一集合包括3个类别,分别是卡车,轿车以及公交汽车,其中卡车对应的样本,即卡车对应的统计目标为1000个,轿车对应的统计目标为1000个,公交汽车对应的统计目标为1000个。针对每一个统计目标,该统计目标都包括尺寸信息。比如,假设卡车对应的1000个统计目标中的统计目标A,则A包括尺寸信息,比如A的物理尺寸,或者A的长度信息,或者A的宽度信息或者A的高度信息中的至少一种。根据每个统计目标的类别,每个统计目标的尺寸信息可以获得第一尺寸的概率分布,即每个类别下统计目标的尺寸的概率分布。如图10所示,展示了一种尺寸信息为目标的高度信息时,第一尺寸的概率分布的示意图。此外,可以通过小孔成像原理确定深度信息与目标的尺寸之间的关系,在一个可能的实施方式中,可以通过多次调整目标距离毫米波雷达的距离,获取深度信息与目标的尺寸之间的关系。当获取了深度信息与目标的尺寸之间的关系,以及目标的尺寸在每一个类别下的概率分布,即可以确定深度信息与每个类别之间的概率分布。In a possible implementation, statistics are performed on the data in the first set to determine the probability distribution of the first size of the statistical target corresponding to each category, the first set includes a plurality of statistical targets corresponding to each category, and each Size information of a statistic object. The first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target. For example, it is assumed that the first set includes 3 categories, namely trucks, cars and buses. The samples corresponding to trucks, that is, the statistical targets corresponding to trucks are 1000, the statistical targets corresponding to cars are 1000, and the samples corresponding to buses are The statistical target is 1000. For each statistical object, the statistical object includes size information. For example, assuming a statistical target A among 1000 statistical targets corresponding to a truck, A includes size information, such as A's physical size, or A's length information, or at least one of A's width information or A's height information. According to the category of each statistical target, the size information of each statistical target can obtain the probability distribution of the first size, that is, the probability distribution of the size of the statistical target under each category. As shown in FIG. 10 , it shows a schematic diagram of the probability distribution of the first size when the size information is the height information of the target. In addition, the relationship between the depth information and the size of the target can be determined through the principle of pinhole imaging. In a possible implementation, the distance between the target and the millimeter-wave radar can be adjusted multiple times to obtain the relationship between the depth information and the size of the target. Relationship. When the relationship between the depth information and the size of the target, and the probability distribution of the size of the target under each category are obtained, the probability distribution between the depth information and each category can be determined.
在一个可能的实施方式中,可以对第一集合进行更新,通过更新后的集合确定深度信息与每个概率类别之间的概率分布,比如可以对第二集合中的数据进行统计,确定每个类别对应的统计目标的第二尺寸分布,第二尺寸分布用于更新第一尺寸分布,第二集合包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第二尺寸分布以及第二关系确定第一概率分布,第二关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。In a possible implementation manner, the first set may be updated, and the probability distribution between the depth information and each probability category may be determined through the updated set. For example, statistics may be performed on the data in the second set to determine each The second size distribution of the statistical objects corresponding to the category, the second size distribution is used to update the first size distribution, and the second set includes a plurality of statistical objects corresponding to each category, and size information of each statistical object. The first probability distribution is determined according to the second size distribution and the second relationship, where the second relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
需要说明的是,本申请提供的方案除了上述详细描述的步骤外,还可以包括一些其他的步骤,本申请实施例对此并不进行限定。下面结合一个具体的实施例进行说明。It should be noted that, in addition to the steps described in detail above, the solution provided by the present application may further include some other steps, which are not limited in the embodiments of the present application. The following description will be given with reference to a specific embodiment.
图11为本申请实施例提供的另一种目标确定方法的流程示意图。FIG. 11 is a schematic flowchart of another target determination method provided by an embodiment of the present application.
如图11所示,通过视觉传感器采集图像,通过毫米波雷达获取毫米波探测点,将毫米波探测点和视频中的图像帧进行时间对齐。在一个可能的实施方式中,视觉传感器的采样频率为第一频率,毫米波雷达的采样频率为第二频率,第一频率和第二频率的差值不大于预设阈值。将毫米波探测点映射到视觉传感器采集的图像上,再将映射有毫米波探测点的图像输入至卷积神经网络中,通过卷积神经网络可以获取图像的第一特征图,根据毫米波探测点在图像上的位置以及毫米波探测点的深度信息生成多个候选框,从第一特征图中提取多个候选框对应的第二特征图。通过分类层和回归层对第二特征图进行处理,并将经过分类层和回归层处理后的第二特征图,根据深度信息进行NMS处理,以输出图像的检测结果以及与该检测结果关联的毫米波探测点。As shown in Figure 11, the image is collected by the vision sensor, and the millimeter-wave detection points are acquired by the millimeter-wave radar, and the millimeter-wave detection points and the image frames in the video are time-aligned. In a possible implementation manner, the sampling frequency of the visual sensor is the first frequency, the sampling frequency of the millimeter wave radar is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold. Map the millimeter wave detection points to the image collected by the vision sensor, and then input the image mapped with the millimeter wave detection points into the convolutional neural network. The first feature map of the image can be obtained through the convolutional neural network. The position of the point on the image and the depth information of the millimeter wave detection point are used to generate multiple candidate frames, and a second feature map corresponding to the multiple candidate frames is extracted from the first feature map. The second feature map is processed by the classification layer and the regression layer, and the second feature map processed by the classification layer and the regression layer is processed by NMS according to the depth information, so as to output the detection result of the image and the detection result associated with the detection result. Millimeter wave detection point.
图12为本申请提供的一种目标确定方法的应用场景的示意图。如图12,本申请提供的 方案可以应用在自动驾驶领域,当本申请提供的方案应用在自动驾驶领域时,可以对道路上的对象进行检测和识别,比如针对道路上的车辆进行检测和识别,可以检测出视觉传感器获取的图像范围内的车辆的位置,车辆的类别以及车辆和自车之间的距离。其中,自车是指安装有该视觉传感器的车辆。如图12所示,针对每一个目标,视觉传感器的检测结果和毫米波探测点的检测结果一一匹配、关联。需要说明的是,本申请提供的方案可以应用在任何需要将视觉传感器和毫米波雷达的目标级检测结果进行关联的场景中。比如,本申请提供的方案可以应用在监控场景中,当本申请提供的方案应用在监控场景中,可以对监控区域内的目标进行检测和识别,比如可以对监控区域内的车辆或者人物进行检测和识别,针对每一个目标,视觉传感器的检测结果和毫米波探测点的检测结果一一匹配、关联。FIG. 12 is a schematic diagram of an application scenario of a target determination method provided by the present application. As shown in Figure 12, the solution provided by this application can be applied in the field of automatic driving. When the solution provided by this application is applied in the field of automatic driving, objects on the road can be detected and recognized, such as detection and recognition of vehicles on the road. , which can detect the position of the vehicle within the image range obtained by the vision sensor, the type of the vehicle and the distance between the vehicle and the own vehicle. Among them, the self-vehicle refers to the vehicle on which the visual sensor is installed. As shown in Figure 12, for each target, the detection results of the vision sensor and the detection results of the millimeter wave detection points are matched and associated one by one. It should be noted that the solution provided in this application can be applied in any scenario where the target-level detection results of the vision sensor and the millimeter-wave radar need to be correlated. For example, the solution provided by this application can be applied in a monitoring scenario. When the solution provided by this application is applied in a monitoring scenario, objects in the monitoring area can be detected and identified, for example, vehicles or people in the monitoring area can be detected. And identification, for each target, the detection results of the vision sensor and the detection results of the millimeter wave detection points are matched and associated one by one.
参阅图13,其中包括了本申请提供的目标确定方法确定的关联结果,和第一方案的结果的对比,该第一方案是一种没有根据深度信息进行NMS处理的方法。本申请可以通过自建的数据库进行测试,其中自建的数据库可以包括多个应设有毫米波波探测点的图像,该图像上带有人工分类标注。AP 50表示图像中最终候选框和待检测的目标物体重合的比例为50%。可以理解为,本申请提供的方法中,根据分类器确定的第一分数以及根据深度信息确定的第二分数进行NMS处理,提升了目标检测的准确性。因此,由图13可知,本申请提供的目标确定方法,卷积神经网络识别图片所输出的AP 50明显高于第一方案。因此,本申请提供的目标确定方法,还可以提升目标检测的精确性。 Referring to FIG. 13 , it includes a comparison between the association result determined by the target determination method provided by the present application and the result of the first solution, which is a method without performing NMS processing according to depth information. The application can be tested through a self-built database, wherein the self-built database can include a plurality of images that should be provided with millimeter wave detection points, and the images are marked with manual classification. AP 50 means that the ratio of the final candidate frame in the image and the target object to be detected coincides with 50%. It can be understood that, in the method provided by the present application, NMS processing is performed according to the first score determined by the classifier and the second score determined according to the depth information, which improves the accuracy of target detection. Therefore, it can be seen from FIG. 13 that, in the target determination method provided by the present application, the AP 50 output by the convolutional neural network for recognizing pictures is significantly higher than that of the first solution. Therefore, the target determination method provided by the present application can also improve the accuracy of target detection.
二、训练过程——一种模型训练方法。Second, the training process - a model training method.
图14为本申请实施例提供的一种模型训练方法的流程示意图。FIG. 14 is a schematic flowchart of a model training method provided by an embodiment of the present application.
如图14所示,本申请实施例提供的一种模型训练方法可以包括以下步骤:As shown in FIG. 14 , a model training method provided by an embodiment of the present application may include the following steps:
1401、获取训练数据。1401. Obtain training data.
训练数据包括多个映射有毫米波探测点的训练图像。训练图像和毫米波探测点是针对相同目标同步获取的数据。本申请有时也将目标称为目标对象,在不强调二者的区别之时,二者表示相同的意思。The training data consists of multiple training images mapped with mmWave detection points. Training images and mmWave detection points are data acquired simultaneously for the same target. In the present application, the target is sometimes referred to as the target object, and the two have the same meaning unless the difference between the two is emphasized.
训练图像携带有目标对象的标签信息,目标对象的标签信息可通过人工标注得到。训练图像也即用于训练目标检测模型的原始图像,目标对象的标签信息可以理解为用于训练目标检测模型的真值(ground truth,GT)。The training image carries the label information of the target object, and the label information of the target object can be obtained by manual annotation. The training image is also the original image used to train the target detection model, and the label information of the target object can be understood as the ground truth (GT) used to train the target detection model.
其中,关于如何将毫米波探测点映射到训练图像上可以参照图6对应的实施例中的步骤1402将多个毫米波探测点映射到待处理图像上进行理解,这里不再重复赘述。How to map the millimeter wave detection points to the training image can be understood by referring to step 1402 in the embodiment corresponding to FIG. 6 to map multiple millimeter wave detection points to the image to be processed, which will not be repeated here.
1402、根据每个毫米波探测点的深度信息和位置信息确定待处理图像的多个候选框。1402. Determine multiple candidate frames of the image to be processed according to the depth information and position information of each millimeter wave detection point.
根据每个毫米波探测点的深度信息和位置信息都可以确定一组候选框,该一组候选框中包括多个候选框。可以参照图6对应的实施例中的步骤604进行理解,这里不再重复赘述。A set of candidate frames can be determined according to the depth information and position information of each millimeter wave detection point, and the set of candidate frames includes multiple candidate frames. It can be understood with reference to step 604 in the embodiment corresponding to FIG. 6 , and details are not repeated here.
1403、根据多个候选框对应的特征对模型进行训练,得到训练好的模型。1403. Train the model according to the features corresponding to the multiple candidate frames, to obtain a trained model.
可以将训练数据输入至模型中,比如该神模型可以是fast R-CNN,faster R-CNN,掩模区域卷积神经网络(mask region-based convolutional neuralnetwork,Mask R-CNN)等等。通过模型可以获取训练数据的第一特征图,根据毫米波探测点在图像上的位置以及毫米波探测点的深度信息生成多个候选框,从第一特征图中提取多个候选框对应的第二特 征图。可以根据第二特征图对模型进行训练,直至模型的损失函数收敛时可以确定为模型训练完成。The training data can be input into the model, for example, the god model can be fast R-CNN, faster R-CNN, mask region-based convolutional neural network (Mask R-CNN) and so on. The first feature map of the training data can be obtained through the model, multiple candidate frames are generated according to the position of the millimeter wave detection point on the image and the depth information of the millimeter wave detection point, and the first feature map corresponding to the multiple candidate frames is extracted from the first feature map. Two feature maps. The model may be trained according to the second feature map, and it may be determined that the model training is completed until the loss function of the model converges.
前述对本申请提供的目标确定方法以及模型训练方法的流程进行了详细介绍,下面基于前述的目标确定方法以及模型训练方法,对本申请提供的目标确定装置,模型训练装置进行阐述,该目标确定装置用于执行前述图6-12对应的方法的步骤,模型训练装置用于执行前述图14对应的方法的步骤。The flow of the target determination method and the model training method provided by the present application has been introduced in detail. The target determination device and the model training device provided by the present application are described below based on the aforementioned target determination method and model training method. The target determination device uses In executing the steps of the method corresponding to the foregoing FIGS. 6-12 , the model training apparatus is configured to execute the steps of the foregoing method corresponding to FIG. 14 .
参阅图15,本申请提供的一种目标确定装置的结构示意图。该目标确定装置包括:Referring to FIG. 15 , a schematic structural diagram of a target determination device provided by the present application. The target determination device includes:
获取模块1501,用于获取待处理图像和多个毫米波探测点,待处理图像和多个毫米波探测点是针对相同目标同步获取的数据。其中,待处理图像可以通过视觉传感器获取,毫米波探测点可以通过毫米波雷达获取。每个毫米波探测点可以包括深度信息,深度信息用于表示检测目标与毫米波雷达的距离,毫米波雷达用于获取多个毫米波探测点。映射模块,用于将获取模块1501获取的多个毫米波探测点映射到获取模块1501获取待处理图像上。处理模块1502,用于根据第一信息确定检测目标在待处理图像上的多个候选框,第一信息可以包括每个毫米波探测点的深度信息和位置信息,位置信息用于表示每个毫米波探测点映射在待处理图像上的位置。处理模块1502,还用于根据深度信息对多个候选框进行非极大值抑制NMS处理,以输出目标框和目标毫米波探测点(即输出针对同一个检测目标的、不同传感器的目标级检测结果),即输出关联结果。目标框是根据目标毫米波探测点的深度信息和位置信息确定的。The acquisition module 1501 is configured to acquire an image to be processed and multiple millimeter wave detection points, where the image to be processed and the multiple millimeter wave detection points are data obtained synchronously for the same target. Among them, the image to be processed can be acquired by a vision sensor, and the millimeter wave detection point can be acquired by a millimeter wave radar. Each millimeter-wave detection point may include depth information, where the depth information is used to represent the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to acquire multiple millimeter-wave detection points. The mapping module is configured to map the multiple millimeter wave detection points acquired by the acquisition module 1501 to the image to be processed acquired by the acquisition module 1501 . The processing module 1502 is used to determine a plurality of candidate frames of the detection target on the image to be processed according to the first information. The first information may include depth information and position information of each millimeter wave detection point, and the position information is used to represent each millimeter wave. The location of the wave detection point mapped on the image to be processed. The processing module 1502 is also used to perform non-maximum suppression NMS processing on multiple candidate frames according to the depth information, so as to output the target frame and the target millimeter wave detection point (that is, output the target-level detection of the same detection target and different sensors). result), that is, output the association result. The target frame is determined according to the depth information and position information of the target millimeter wave detection point.
在一个可能的实施方式中,处理模块1502,具体用于:根据第一分数和第二分数对多个候选框进行非极大值抑制NMS处理,第一分数表示根据分类器确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率,N个类别为预先设定的类别,N为正整数,第二分数表示根据深度信息与每个类别之间的第一概率分布确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率。In a possible implementation manner, the processing module 1502 is specifically configured to: perform non-maximum suppression NMS processing on multiple candidate frames according to a first score and a second score, where the first score represents each The probability that the detection target in the candidate box belongs to each of the N categories, N categories are preset categories, N is a positive integer, and the second score represents the first probability between each category according to the depth information The probability that the detection target in each candidate box belongs to each of the N categories, determined by the distribution.
在一个可能的实施方式中,目标确定装置还可以包括统计模块1503,统计模块1503,用于对第一集合中的数据进行统计,确定每个类别对应的统计目标的第一尺寸的概率分布,第一集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第一尺寸的概率分布以及第一关系确定第一概率分布,第一关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。In a possible implementation manner, the target determination device may further include a statistics module 1503, the statistics module 1503 is configured to perform statistics on the data in the first set, and determine the probability distribution of the first size of the statistical target corresponding to each category, The first set may include a plurality of statistical objects corresponding to each category, and size information of each statistical object. The first probability distribution is determined according to the probability distribution of the first size and the first relationship, where the first relationship is the relationship between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target.
在一个可能的实施方式中,统计模块1503,还用于:对第二集合中的数据进行统计,确定每个类别对应的统计目标的第二尺寸的概率分布,第二集合可以包括每个类别对应的多个统计目标,以及每个统计目标的尺寸信息。根据第二尺寸分布以及第二关系确定第二概率分布,所示第二概率分布用于更新第一概率分布,第二关系为统计目标的尺寸与统计目标对应的毫米波探测点的深度信息之间的关系。In a possible implementation manner, the statistics module 1503 is further configured to: perform statistics on the data in the second set, and determine the probability distribution of the second size of the statistical target corresponding to each category, and the second set may include each category Corresponding multiple statistical targets, and size information of each statistical target. The second probability distribution is determined according to the second size distribution and the second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the difference between the size of the statistical target and the depth information of the millimeter wave detection point corresponding to the statistical target relationship between.
在一个可能的实施方式中,尺寸信息为统计目标的高度信息。In a possible implementation, the size information is height information of the statistical target.
在一个可能的实施方式中,位置信息用于结合毫米波探测点在车辆上的分布特性确定候选框在待处理图像中的位置。In a possible implementation, the position information is used to determine the position of the candidate frame in the image to be processed in combination with the distribution characteristics of the millimeter wave detection points on the vehicle.
在一个可能的实施方式中,深度信息用于确定候选框的尺寸,候选框的尺寸与深度信 息负相关。In a possible implementation, the depth information is used to determine the size of the candidate frame, and the size of the candidate frame is negatively correlated with the depth information.
在一个可能的实施方式中,处理模块1502,还用于:对待处理图像进行卷积处理,得到待处理图像的第一特征图。从第一特征图中提取多个候选框对应的第二特征图。通过回归网络和分类器对第二特征图进行处理,以得到第一结果,第一结果用于进行非极大值抑制NMS处理。In a possible implementation manner, the processing module 1502 is further configured to: perform convolution processing on the image to be processed to obtain a first feature map of the image to be processed. The second feature maps corresponding to the plurality of candidate frames are extracted from the first feature map. The second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
可选地,结合上述第二方面或第二方面第一种至第二方面第七种可能的实施方式,在第八种可能的实施方式中,通过视觉传感器获取待处理图像,视觉传感器的采样频率为第一频率,毫米波雷达的采样频率为第二频率,第一频率和第二频率的差值不大于预设阈值。Optionally, in combination with the second aspect or the seventh possible implementation manner of the second aspect or the first to the second aspect, in the eighth possible implementation manner, the image to be processed is acquired by the visual sensor, and the sampling of the visual sensor is used. The frequency is the first frequency, the sampling frequency of the millimeter wave radar is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold.
参阅图16,本申请提供的一种模型训练装置的结构示意图。该模型训练装置包括:Referring to FIG. 16 , a schematic structural diagram of a model training apparatus provided by the present application. The model training device includes:
获取模块1601,用于执行图14对应的实施例中的步骤1401。The acquiring module 1601 is configured to perform step 1401 in the embodiment corresponding to FIG. 14 .
训练模块1602,用于执行图14对应的实施例中的步骤1402和步骤1403。The training module 1602 is configured to perform steps 1402 and 1403 in the embodiment corresponding to FIG. 14 .
请参阅图17,本申请提供的另一种目标确定装置的结构示意图,如下所述。Please refer to FIG. 17 , which is a schematic structural diagram of another target determination apparatus provided by the present application, as described below.
该目标确定装置可以包括处理器1701和存储器1702。该处理器1701和存储器1702通过线路互联。其中,存储器1702中存储有程序指令和数据。The target determination apparatus may include a processor 1701 and a memory 1702 . The processor 1701 and the memory 1702 are interconnected by wires. Among them, the memory 1702 stores program instructions and data.
存储器1702中存储了前述图6或图8中的步骤对应的程序指令以及数据。The program instructions and data corresponding to the steps in FIG. 6 or FIG. 8 are stored in the memory 1702 .
处理器1701用于执行前述图6或图8中任一实施例所示的目标确定装置执行的方法步骤。The processor 1701 is configured to perform the method steps performed by the target determination apparatus shown in any of the foregoing embodiments in FIG. 6 or FIG. 8 .
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图6或图8所示实施例描述的方法中的步骤。Embodiments of the present application also provide a computer-readable storage medium, where a program for generating a vehicle's running speed is stored in the computer-readable storage medium, and when the computer is running on a computer, the computer is made to execute the program shown in FIG. 6 or FIG. 8 above. The illustrated embodiment describes the steps in the method.
可选地,前述的图17中所示的目标确定装置为芯片。Optionally, the aforementioned target determination device shown in FIG. 17 is a chip.
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器1701,或者处理器1701的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中目标确定装置执行的动作。The embodiments of the present application also provide a digital processing chip. The digital processing chip integrates circuits and one or more interfaces for implementing the above-mentioned processor 1701 or the functions of the processor 1701 . When a memory is integrated in the digital processing chip, the digital processing chip can perform the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface. The digital processing chip implements the actions performed by the target determination device in the above embodiment according to the program codes stored in the external memory.
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上行驶时,使得计算机执行如前述图6或图8所示实施例描述的方法中目标确定装置所执行的步骤。An embodiment of the present application also provides a computer program product, which, when driving on a computer, causes the computer to execute the steps performed by the target determination device in the method described in the embodiment shown in FIG. 6 or FIG. 8 .
本申请实施例提供的目标确定装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图6或图8所示实施例描述的目标确定方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The target determination apparatus provided in this embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the target determination method described in the embodiment shown in FIG. 6 or FIG. 8 . Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit, CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。Specifically, the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), a digital signal processing digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or it may be any conventional processor or the like.
具体的,请参阅图18,图18为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 180,NPU 180作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1803,通过控制器1804控制运算电路1803提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 18. FIG. 18 is a schematic structural diagram of a chip provided by an embodiment of the application. The chip may be represented as a neural network processor NPU 180, and the NPU 180 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1803, which is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
在一些实现中,运算电路1803内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路1803是二维脉动阵列。运算电路1803还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1803是通用的矩阵处理器。In some implementations, the arithmetic circuit 1803 includes multiple processing units (process engines, PEs). In some implementations, the arithmetic circuit 1803 is a two-dimensional systolic array. The arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1803 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1802中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1801中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1808中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit fetches the data corresponding to the matrix B from the weight memory 1802 and buffers it on each PE in the operation circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1801 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1808 .
统一存储器1806用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)1805,DMAC被搬运到权重存储器1802中。输入数据也通过DMAC被搬运到统一存储器1806中。Unified memory 1806 is used to store input data and output data. The weight data is directly passed through the storage unit access controller (direct memory access controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802. Input data is also moved to unified memory 1806 via the DMAC.
总线接口单元(bus interface unit,BIU)1810,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1809的交互。A bus interface unit (BIU) 1810 is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1809.
总线接口单元1810(bus interface unit,BIU),用于取指存储器1809从外部存储器获取指令,还用于存储单元访问控制器1805从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1810 (bus interface unit, BIU) is used for the instruction fetch memory 1809 to obtain instructions from the external memory, and also for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1806或将权重数据搬运到权重存储器1802中或将输入数据数据搬运到输入存储器1801中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 , the weight data to the weight memory 1802 , or the input data to the input memory 1801 .
向量计算单元1807包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如批归一化(batch normalization),像素级求和,对特征平面进行上采样等。The vector calculation unit 1807 includes a plurality of operation processing units, and if necessary, further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/fully connected layer network computations in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
在一些实现中,向量计算单元1807能将经处理的输出的向量存储到统一存储器1806。例如,向量计算单元1807可以将线性函数和/或非线性函数应用到运算电路1803的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1807生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1803的激活输入,例如用于在神经网络 中的后续层中的使用。In some implementations, the vector computation unit 1807 can store the processed output vectors to the unified memory 1806 . For example, the vector calculation unit 1807 may apply a linear function and/or a non-linear function to the output of the operation circuit 1803, such as linear interpolation of the feature plane extracted by the convolution layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 1807 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 1803, such as for use in subsequent layers in a neural network.
控制器1804连接的取指存储器(instruction fetch buffer)1809,用于存储控制器1804使用的指令;The instruction fetch buffer (instruction fetch buffer) 1809 connected to the controller 1804 is used to store the instructions used by the controller 1804;
统一存储器1806,输入存储器1801,权重存储器1802以及取指存储器1809均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1806, the input memory 1801, the weight memory 1802 and the instruction fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
其中,循环神经网络中各层的运算可以由运算电路1803或向量计算单元1807执行。The operation of each layer in the RNN can be performed by the operation circuit 1803 or the vector calculation unit 1807 .
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述图6或图8的方法的程序执行的集成电路。Wherein, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in FIG. 6 or FIG. 8 .
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), disk or CD, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
本申请的说明书和权利要求书及上述附图中的术语“第一”,“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程,方法,系统,产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程,方法,产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。本申请中所出现的模块的划分,是一种逻辑上的划分,实际应用中实现时可以有另外的划分方式,例如多个模块可以结合成或集成在另一个系统中,或一些特征可以忽略,或不执行,另外,所显示的或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些端口,模块之间的间接耦合或通信连接可以是电性或其他类似的形式,本申请中均不作限定。并且,作为分离部件说明的模块或子模块可以是也可以不是物理上的分离,可以是也可以不是物理模块,或者可以分布到多个电路模块中,可以根据实际的需要选择其中的部分或全部模块来实现本申请方案的目的。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. The term "and/or" in this application is only an association relationship to describe associated objects, which means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, independently There are three cases of B. In addition, the character "/" in this article generally indicates that the related objects before and after are an "or" relationship. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion, eg, a process, method, system, product or device comprising a series of steps or modules not necessarily limited to those expressly listed Rather, those steps or modules may include other steps or modules not expressly listed or inherent to these processes, methods, products or apparatus. The naming or numbering of the steps in this application does not mean that the steps in the method flow must be executed in the time/logical sequence indicated by the naming or numbering, and the named or numbered process steps can be implemented according to the The technical purpose is to change the execution order, as long as the same or similar technical effects can be achieved. The division of modules in this application is a logical division. In practical applications, there may be other divisions. For example, multiple modules may be combined or integrated into another system, or some features may be ignored. , or not implemented, in addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some ports, and the indirect coupling or communication connection between modules may be electrical or other similar forms. There are no restrictions in the application. In addition, the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed into multiple circuit modules, and some or all of them may be selected according to actual needs. module to achieve the purpose of the solution of this application.

Claims (22)

  1. 一种目标确定方法,其特征在于,包括:A method for determining a target, comprising:
    获取待处理图像和多个毫米波探测点,所述待处理图像和所述多个毫米波探测点是针对相同检测目标同步获取的数据,每个所述毫米波探测点包括深度信息,所述深度信息用于表示所述检测目标与毫米波雷达的距离,所述毫米波雷达用于获取所述多个毫米波探测点;Obtain an image to be processed and multiple millimeter wave detection points, the image to be processed and the multiple millimeter wave detection points are data obtained synchronously for the same detection target, each of the millimeter wave detection points includes depth information, and the The depth information is used to indicate the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to obtain the plurality of millimeter-wave detection points;
    将所述多个毫米波探测点映射到所述待处理图像上;mapping the plurality of millimeter wave detection points to the to-be-processed image;
    根据第一信息确定所述检测目标在所述待处理图像上的多个候选框,所述第一信息包括每个所述毫米波探测点的所述深度信息和位置信息,所述位置信息用于表示每个毫米波探测点映射在所述待处理图像上的位置;A plurality of candidate frames of the detection target on the to-be-processed image are determined according to first information, where the first information includes the depth information and position information of each of the millimeter wave detection points, and the position information uses to represent the position of each millimeter wave detection point mapped on the image to be processed;
    根据所述深度信息对所述多个候选框进行非极大值抑制NMS处理,以输出目标框和目标毫米波探测点,所述目标框是根据所述目标毫米波探测点的所述深度信息和所述位置信息确定的。Perform non-maximum suppression NMS processing on the plurality of candidate frames according to the depth information to output a target frame and a target millimeter wave detection point, where the target frame is based on the depth information of the target millimeter wave detection point and the location information is determined.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述深度信息对所述多个候选框进行非极大值抑制NMS处理,包括:The method according to claim 1, wherein the performing non-maximum value suppression NMS processing on the plurality of candidate frames according to the depth information comprises:
    根据第一分数和第二分数对所述多个候选框进行非极大值抑制NMS处理,所述第一分数表示根据分类器确定的每个候选框中的检测目标属于N个类别中的每个类别的概率,所述N个类别为预先设定的类别,所述N为正整数,所述第二分数表示根据所述深度信息与所述每个类别之间的第一概率分布确定的每个候选框中的检测目标属于N个类别中的每个类别的概率。Perform non-maximum suppression NMS processing on the plurality of candidate frames according to a first score and a second score, where the first score indicates that the detection target in each candidate frame determined by the classifier belongs to each of the N categories The probability of the number of categories, the N categories are preset categories, the N is a positive integer, the second score represents the probability distribution between the depth information and the first probability distribution between the categories The probability that the detection target in each candidate box belongs to each of the N classes.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    对第一集合中的数据进行统计,确定所述每个类别对应的统计目标的第一尺寸的概率分布,所述第一集合包括所述每个类别对应的多个统计目标,以及每个所述统计目标的尺寸信息;Statistics are performed on the data in the first set to determine the probability distribution of the first size of the statistical target corresponding to each category, and the first set includes a plurality of statistical targets corresponding to each category, and each Describe the size information of the statistical target;
    根据所述第一尺寸的概率分布以及第一关系确定所述第一概率分布,所述第一关系为所述统计目标的尺寸与所述统计目标对应的毫米波探测点的深度信息之间的关系。The first probability distribution is determined according to the probability distribution of the first size and a first relationship, where the first relationship is a relationship between the size of the statistical target and the depth information of the millimeter-wave detection point corresponding to the statistical target relation.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:The method according to claim 3, wherein the method further comprises:
    对第二集合中的数据进行统计,确定所述每个类别对应的统计目标的第二尺寸的概率分布,所述第二集合包括所述每个类别对应的多个统计目标,以及每个所述统计目标的尺寸信息;Statistics are performed on the data in the second set to determine the probability distribution of the second size of the statistical target corresponding to each category, and the second set includes a plurality of statistical targets corresponding to each category, and each Describe the size information of the statistical target;
    根据所述第二尺寸分布以及第二关系确定第二概率分布,所示第二概率分布用于更新所述第一概率分布,所述第二关系为所述统计目标的尺寸与所述统计目标对应的毫米波探测点的深度信息之间的关系。A second probability distribution is determined according to the second size distribution and a second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the size of the statistical target and the statistical target The relationship between the depth information of the corresponding mmWave detection points.
  5. 根据权利要求3或4所述的方法,其特征在于,所述尺寸信息为所述统计目标的高度信息。The method according to claim 3 or 4, wherein the size information is height information of the statistical target.
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述位置信息用于结合所述毫米波探测点在车辆上的分布特性确定所述候选框在所述待处理图像中的位置。The method according to any one of claims 1 to 5, wherein the position information is used to determine the position of the candidate frame in the image to be processed in combination with the distribution characteristics of the millimeter wave detection points on the vehicle Location.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述深度信息用于确定所述候选框的尺寸,所述候选框的尺寸与所述深度信息负相关。The method according to any one of claims 1 to 6, wherein the depth information is used to determine the size of the candidate frame, and the size of the candidate frame is negatively correlated with the depth information.
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:
    通过高效区域卷积神经网络Faster-RCNN对所述待处理图像进行处理,得到所述待处理图像的第一特征图;Process the to-be-processed image through an efficient regional convolutional neural network Faster-RCNN to obtain a first feature map of the to-be-processed image;
    从所述第一特征图中提取所述多个候选框对应的第二特征图;extracting second feature maps corresponding to the plurality of candidate frames from the first feature map;
    通过回归网络和分类器对所述第二特征图进行处理,以得到第一结果,所述第一结果用于进行非极大值抑制NMS处理。The second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
  9. 根据权利要求1至8任一项所述的方法,其特征在于,通过视觉传感器获取所述待处理图像,所述视觉传感器的采样频率为第一频率,所述毫米波雷达的采样频率为第二频率,所述第一频率和所述第二频率的差值不大于预设阈值。The method according to any one of claims 1 to 8, wherein the image to be processed is acquired by a vision sensor, the sampling frequency of the vision sensor is the first frequency, and the sampling frequency of the millimeter wave radar is the first frequency Two frequencies, the difference between the first frequency and the second frequency is not greater than a preset threshold.
  10. 一种目标确定装置,其特征在于,包括:A device for determining a target, comprising:
    获取模块,用于获取待处理图像和多个毫米波探测点,所述待处理图像和所述多个毫米波探测点是同步获取的数据,每个所述毫米波探测点包括深度信息,所述深度信息用于表示检测目标与毫米波雷达的距离,所述毫米波雷达用于获取所述多个毫米波探测点;The acquisition module is configured to acquire the image to be processed and multiple millimeter wave detection points, the image to be processed and the multiple millimeter wave detection points are data acquired synchronously, and each of the millimeter wave detection points includes depth information, and the The depth information is used to indicate the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to obtain the plurality of millimeter-wave detection points;
    映射模块,用于将所述获取模块获取的所述多个毫米波探测点映射到所述获取模块获取所述待处理图像上;a mapping module, configured to map the plurality of millimeter wave detection points acquired by the acquisition module to the to-be-processed image acquired by the acquisition module;
    处理模块,用于根据第一信息确定所述检测目标在所述待处理图像上的多个候选框,所述第一信息包括每个所述毫米波探测点的所述深度信息和位置信息,所述位置信息用于表示每个毫米波探测点映射在所述待处理图像上的位置;a processing module, configured to determine a plurality of candidate frames of the detection target on the to-be-processed image according to first information, where the first information includes the depth information and position information of each of the millimeter wave detection points, The position information is used to indicate the position of each millimeter wave detection point mapped on the image to be processed;
    所述处理模块,还用于根据所述深度信息对所述多个候选框进行非极大值抑制NMS处理,以输出目标框和目标毫米波探测点,所述目标框是根据所述目标毫米波探测点的所述深度信息和所述位置信息确定的。The processing module is further configured to perform non-maximum suppression NMS processing on the plurality of candidate frames according to the depth information, so as to output a target frame and a target millimeter wave detection point, and the target frame is based on the target millimeter wave detection point. The depth information and the position information of the wave detection point are determined.
  11. 根据权利要求10所述的目标确定装置,其特征在于,所述处理模块,具体用于:The target determination device according to claim 10, wherein the processing module is specifically configured to:
    根据第一分数和第二分数对所述多个候选框进行非极大值抑制NMS处理,所述第一分数表示根据分类器确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率,所述N个类别为预先设定的类别,所述N为正整数,所述第二分数表示根据所述深度信息与所述每个类别之间的第一概率分布确定的、每个候选框中的检测目标属于N个类别中的每个类别的概率。Perform non-maximum suppression NMS processing on the plurality of candidate frames according to a first score and a second score, where the first score indicates that the detection target in each candidate frame belongs to N categories determined according to the classifier. The probability of each category, the N categories are preset categories, the N is a positive integer, and the second score indicates that it is determined according to the first probability distribution between the depth information and each category , the probability that the detection target in each candidate box belongs to each of the N categories.
  12. 根据权利要求11所述的目标确定装置,其特征在于,所述目标确定装置还包括统计模块,The target determination device according to claim 11, wherein the target determination device further comprises a statistics module,
    所述统计模块,用于对第一集合中的数据进行统计,确定所述每个类别对应的统计目标的第一尺寸的概率分布,所述第一集合包括所述每个类别对应的多个统计目标,以及每个所述统计目标的尺寸信息;The statistics module is configured to perform statistics on the data in the first set, and determine the probability distribution of the first size of the statistical target corresponding to each category, and the first set includes a plurality of statistical objects, and size information for each of said statistical objects;
    根据所述第一尺寸的概率分布以及第一关系确定所述第一概率分布,所述第一关系为所述统计目标的尺寸与所述统计目标对应的毫米波探测点的深度信息之间的关系。The first probability distribution is determined according to the probability distribution of the first size and a first relationship, where the first relationship is a relationship between the size of the statistical target and the depth information of the millimeter-wave detection point corresponding to the statistical target relation.
  13. 根据权利要求12所述的目标确定装置,其特征在于,所述统计模块,还用于:The target determination device according to claim 12, wherein the statistics module is further configured to:
    对第二集合中的数据进行统计,确定所述每个类别对应的统计目标的第二尺寸的概率分布,所述第二集合包括所述每个类别对应的多个统计目标,以及每个所述统计目标的尺寸信息;Statistics are performed on the data in the second set to determine the probability distribution of the second size of the statistical target corresponding to each category, and the second set includes a plurality of statistical targets corresponding to each category, and each Describe the size information of the statistical target;
    根据所述第二尺寸分布以及第二关系确定第二概率分布,所示第二概率分布用于更新所述第一概率分布,所述第二关系为所述统计目标的尺寸与所述统计目标对应的毫米波探测点的深度信息之间的关系。A second probability distribution is determined according to the second size distribution and a second relationship, the second probability distribution is used to update the first probability distribution, and the second relationship is the size of the statistical target and the statistical target The relationship between the depth information of the corresponding mmWave detection points.
  14. 根据权利要求12或13所述的目标确定装置,其特征在于,所述尺寸信息为所述统计目标的高度信息。The target determination device according to claim 12 or 13, wherein the size information is height information of the statistical target.
  15. 根据权利要求10至14任一项所述的目标确定装置,其特征在于,所述位置信息用于结合所述毫米波探测点在车辆上的分布特性确定所述候选框在所述待处理图像中的位置。The target determination device according to any one of claims 10 to 14, wherein the position information is used to determine the candidate frame in the image to be processed in combination with the distribution characteristics of the millimeter wave detection points on the vehicle in the location.
  16. 根据权利要求10至15任一项所述的目标确定装置,其特征在于,所述深度信息用于确定所述候选框的尺寸,所述候选框的尺寸与所述深度信息负相关。The target determination apparatus according to any one of claims 10 to 15, wherein the depth information is used to determine the size of the candidate frame, and the size of the candidate frame is negatively correlated with the depth information.
  17. 根据权利要求10至16任一项所述的目标确定装置,其特征在于,所述处理模块,还用于:The target determination device according to any one of claims 10 to 16, wherein the processing module is further configured to:
    对所述待处理图像进行卷积处理,得到所述待处理图像的第一特征图;Perform convolution processing on the to-be-processed image to obtain a first feature map of the to-be-processed image;
    从所述第一特征图中提取所述多个候选框对应的第二特征图;extracting second feature maps corresponding to the plurality of candidate frames from the first feature map;
    通过回归网络和分类器对所述第二特征图进行处理,以得到第一结果,所述第一结果用于进行非极大值抑制NMS处理。The second feature map is processed through a regression network and a classifier to obtain a first result, and the first result is used for non-maximum suppression NMS processing.
  18. 根据权利要求10至17任一项所述的目标确定装置,其特征在于,通过视觉传感器获取所述待处理图像,所述视觉传感器的采样频率为第一频率,所述毫米波雷达的采样频率为第二频率,所述第一频率和所述第二频率的差值不大于预设阈值。The target determination device according to any one of claims 10 to 17, wherein the image to be processed is acquired by a vision sensor, the sampling frequency of the vision sensor is the first frequency, and the sampling frequency of the millimeter-wave radar is the first frequency. is the second frequency, and the difference between the first frequency and the second frequency is not greater than a preset threshold.
  19. 一种智能汽车,其特征在于,所述智能汽车包括处理器,所述处理器和存储器耦合,所述存储器存储有程序指令,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至9中任一项所述的方法。A smart car, characterized in that the smart car includes a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, a right is realized The method of any one of claims 1 to 9.
  20. 一种监控设备,其特征在于,所述监控设备处理器,所述处理器和存储器耦合,所述存储器存储有程序指令,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至9中任一项所述的方法。A monitoring device, characterized in that the monitoring device has a processor, the processor is coupled to a memory, the memory stores program instructions, and the claims are realized when the program instructions stored in the memory are executed by the processor The method of any one of 1 to 9.
  21. 一种计算机可读存储介质,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1至9中任一项所述的方法。A computer-readable storage medium comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 9.
  22. 一种目标确定系统,其特征在于,所述目标确定系统包括端侧设备和云侧设备,A target determination system, characterized in that the target determination system includes a terminal-side device and a cloud-side device,
    所述端侧设备,用于获取待处理图像和多个毫米波探测点,所述待处理图像和所述多个毫米波探测点是同步获取的数据,每个所述毫米波探测点包括深度信息,所述深度信息用于表示检测目标与毫米波雷达的距离,所述毫米波雷达用于获取所述多个毫米波探测点;The end-side device is used to acquire an image to be processed and multiple millimeter-wave detection points, the image to be processed and the multiple millimeter-wave detection points are data acquired synchronously, and each of the millimeter-wave detection points includes a depth information, the depth information is used to indicate the distance between the detection target and the millimeter-wave radar, and the millimeter-wave radar is used to obtain the multiple millimeter-wave detection points;
    所述云侧设备,用于接收所述端侧设备发送的所述待处理图像和多个毫米波探测点;the cloud-side device, configured to receive the to-be-processed image and multiple millimeter wave detection points sent by the terminal-side device;
    所述云侧设备,还用于将所述多个毫米波探测点映射到所述待处理图像上;The cloud-side device is further configured to map the plurality of millimeter wave detection points to the to-be-processed image;
    所述云侧设备,还用于根据第一信息确定所述检测目标在所述待处理图像上的多个候 选框,所述第一信息包括每个所述毫米波探测点的所述深度信息和位置信息,所述位置信息用于表示每个毫米波探测点映射在所述待处理图像上的位置;The cloud-side device is further configured to determine a plurality of candidate frames of the detection target on the to-be-processed image according to first information, where the first information includes the depth information of each of the millimeter wave detection points and position information, the position information is used to represent the position of each millimeter wave detection point mapped on the image to be processed;
    所述云侧设备,还用于根据所述深度信息对所述多个候选框进行非极大值抑制NMS处理,以输出目标框和目标毫米波探测点,所述目标框是根据所述目标毫米波探测点的所述深度信息和所述位置信息确定的。The cloud-side device is further configured to perform non-maximum suppression NMS processing on the plurality of candidate frames according to the depth information, so as to output a target frame and a target millimeter wave detection point, and the target frame is based on the target frame. The depth information and the position information of the millimeter wave detection point are determined.
PCT/CN2021/094781 2020-07-17 2021-05-20 Target determination method and target determination device WO2022012158A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010692086.4 2020-07-17
CN202010692086.4A CN114022830A (en) 2020-07-17 2020-07-17 Target determination method and target determination device

Publications (1)

Publication Number Publication Date
WO2022012158A1 true WO2022012158A1 (en) 2022-01-20

Family

ID=79554481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094781 WO2022012158A1 (en) 2020-07-17 2021-05-20 Target determination method and target determination device

Country Status (2)

Country Link
CN (1) CN114022830A (en)
WO (1) WO2022012158A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898314A (en) * 2022-04-29 2022-08-12 广州文远知行科技有限公司 Target detection method, device and equipment for driving scene and storage medium
CN115034324A (en) * 2022-06-21 2022-09-09 同济大学 Multi-sensor fusion perception efficiency enhancement method
US11448748B2 (en) 2020-09-10 2022-09-20 Argo AI, LLC Systems and methods for simultaneous range-rate unwrapping and outlier removal for radar
CN115236674A (en) * 2022-06-15 2022-10-25 北京踏歌智行科技有限公司 Mining area environment sensing method based on 4D millimeter wave radar
CN115508773A (en) * 2022-10-27 2022-12-23 中国电子科技集团公司信息科学研究院 Time difference method multi-station passive positioning method, system, electronic equipment and storage medium
CN115524662A (en) * 2022-10-27 2022-12-27 中国电子科技集团公司信息科学研究院 Direction finding time difference combined positioning method and system, electronic equipment and storage medium
CN116125466A (en) * 2023-03-02 2023-05-16 武汉理工大学 Ship personnel hidden threat object carrying detection method and device and electronic equipment
US11662454B2 (en) 2020-11-02 2023-05-30 Ford Global Technologies, Llc Systems and methods for range-rate dealiasing using position consistency
CN116812590A (en) * 2023-08-29 2023-09-29 苏州双祺自动化设备股份有限公司 Visual-based unloading method and system
CN116894102A (en) * 2023-06-26 2023-10-17 珠海微度芯创科技有限责任公司 Millimeter wave imaging video stream filtering method, device, equipment and storage medium
CN116958510A (en) * 2022-04-19 2023-10-27 广州镭晨智能装备科技有限公司 Target detection frame acquisition method, device, equipment and storage medium
CN117093872A (en) * 2023-10-19 2023-11-21 四川数字交通科技股份有限公司 Self-training method and system for radar target classification model
CN115236674B (en) * 2022-06-15 2024-06-04 北京踏歌智行科技有限公司 Mining area environment sensing method based on 4D millimeter wave radar

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253980A1 (en) * 2017-03-03 2018-09-06 Farrokh Mohamadi Drone Terrain Surveillance with Camera and Radar Sensor Fusion for Collision Avoidance
CN110033424A (en) * 2019-04-18 2019-07-19 北京迈格威科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of image procossing
CN110796103A (en) * 2019-11-01 2020-02-14 邵阳学院 Target based on fast-RCNN and distance detection method thereof
CN111352112A (en) * 2020-05-08 2020-06-30 泉州装备制造研究所 Target detection method based on vision, laser radar and millimeter wave radar
CN111368706A (en) * 2020-03-02 2020-07-03 南京航空航天大学 Data fusion dynamic vehicle detection method based on millimeter wave radar and machine vision

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253980A1 (en) * 2017-03-03 2018-09-06 Farrokh Mohamadi Drone Terrain Surveillance with Camera and Radar Sensor Fusion for Collision Avoidance
CN110033424A (en) * 2019-04-18 2019-07-19 北京迈格威科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of image procossing
CN110796103A (en) * 2019-11-01 2020-02-14 邵阳学院 Target based on fast-RCNN and distance detection method thereof
CN111368706A (en) * 2020-03-02 2020-07-03 南京航空航天大学 Data fusion dynamic vehicle detection method based on millimeter wave radar and machine vision
CN111352112A (en) * 2020-05-08 2020-06-30 泉州装备制造研究所 Target detection method based on vision, laser radar and millimeter wave radar

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11448748B2 (en) 2020-09-10 2022-09-20 Argo AI, LLC Systems and methods for simultaneous range-rate unwrapping and outlier removal for radar
US11662454B2 (en) 2020-11-02 2023-05-30 Ford Global Technologies, Llc Systems and methods for range-rate dealiasing using position consistency
CN116958510B (en) * 2022-04-19 2024-05-28 广州镭晨智能装备科技有限公司 Target detection frame acquisition method, device, equipment and storage medium
CN116958510A (en) * 2022-04-19 2023-10-27 广州镭晨智能装备科技有限公司 Target detection frame acquisition method, device, equipment and storage medium
CN114898314A (en) * 2022-04-29 2022-08-12 广州文远知行科技有限公司 Target detection method, device and equipment for driving scene and storage medium
CN115236674A (en) * 2022-06-15 2022-10-25 北京踏歌智行科技有限公司 Mining area environment sensing method based on 4D millimeter wave radar
CN115236674B (en) * 2022-06-15 2024-06-04 北京踏歌智行科技有限公司 Mining area environment sensing method based on 4D millimeter wave radar
CN115034324A (en) * 2022-06-21 2022-09-09 同济大学 Multi-sensor fusion perception efficiency enhancement method
CN115034324B (en) * 2022-06-21 2023-05-02 同济大学 Multi-sensor fusion perception efficiency enhancement method
CN115524662B (en) * 2022-10-27 2023-09-19 中国电子科技集团公司信息科学研究院 Direction finding time difference joint positioning method, system, electronic equipment and storage medium
CN115508773B (en) * 2022-10-27 2023-09-19 中国电子科技集团公司信息科学研究院 Multi-station passive positioning method and system by time difference method, electronic equipment and storage medium
CN115524662A (en) * 2022-10-27 2022-12-27 中国电子科技集团公司信息科学研究院 Direction finding time difference combined positioning method and system, electronic equipment and storage medium
CN115508773A (en) * 2022-10-27 2022-12-23 中国电子科技集团公司信息科学研究院 Time difference method multi-station passive positioning method, system, electronic equipment and storage medium
CN116125466A (en) * 2023-03-02 2023-05-16 武汉理工大学 Ship personnel hidden threat object carrying detection method and device and electronic equipment
CN116894102A (en) * 2023-06-26 2023-10-17 珠海微度芯创科技有限责任公司 Millimeter wave imaging video stream filtering method, device, equipment and storage medium
CN116894102B (en) * 2023-06-26 2024-02-20 珠海微度芯创科技有限责任公司 Millimeter wave imaging video stream filtering method, device, equipment and storage medium
CN116812590A (en) * 2023-08-29 2023-09-29 苏州双祺自动化设备股份有限公司 Visual-based unloading method and system
CN116812590B (en) * 2023-08-29 2023-11-10 苏州双祺自动化设备股份有限公司 Visual-based unloading method and system
CN117093872A (en) * 2023-10-19 2023-11-21 四川数字交通科技股份有限公司 Self-training method and system for radar target classification model
CN117093872B (en) * 2023-10-19 2024-01-02 四川数字交通科技股份有限公司 Self-training method and system for radar target classification model

Also Published As

Publication number Publication date
CN114022830A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
WO2022012158A1 (en) Target determination method and target determination device
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
Chen et al. Attention-based context aggregation network for monocular depth estimation
WO2020244653A1 (en) Object identification method and device
CN110378381B (en) Object detection method, device and computer storage medium
Jörgensen et al. Monocular 3d object detection and box fitting trained end-to-end using intersection-over-union loss
CN109478239B (en) Method for detecting object in image and object detection system
JP6667596B2 (en) Object detection system, autonomous vehicle using the same, and object detection method thereof
Dai et al. Multi-task faster R-CNN for nighttime pedestrian detection and distance estimation
CN111401517B (en) Method and device for searching perceived network structure
CN115244421A (en) Object size estimation using camera map and/or radar information
Prophet et al. Semantic segmentation on automotive radar maps
Yao et al. Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review
CN110210474A (en) Object detection method and device, equipment and storage medium
CN111931764A (en) Target detection method, target detection framework and related equipment
Ahmed et al. A real-time efficient object segmentation system based on U-Net using aerial drone images
CN115147333A (en) Target detection method and device
Atoum et al. Monocular video-based trailer coupler detection using multiplexer convolutional neural network
CN114998610A (en) Target detection method, device, equipment and storage medium
CN117808689A (en) Depth complement method based on fusion of millimeter wave radar and camera
CN114972492A (en) Position and pose determination method and device based on aerial view and computer storage medium
CN114972182A (en) Object detection method and device
CN115131756A (en) Target detection method and device
CN111507938B (en) Human body dangerous goods detection method and system
CN110501709B (en) Target detection system, autonomous vehicle, and target detection method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21843071

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21843071

Country of ref document: EP

Kind code of ref document: A1