WO2022104774A1 - 目标检测方法和装置 - Google Patents

目标检测方法和装置 Download PDF

Info

Publication number
WO2022104774A1
WO2022104774A1 PCT/CN2020/130816 CN2020130816W WO2022104774A1 WO 2022104774 A1 WO2022104774 A1 WO 2022104774A1 CN 2020130816 W CN2020130816 W CN 2020130816W WO 2022104774 A1 WO2022104774 A1 WO 2022104774A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image data
dimensional information
neural network
network model
Prior art date
Application number
PCT/CN2020/130816
Other languages
English (en)
French (fr)
Inventor
果晨阳
刘建琴
支晶晶
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/130816 priority Critical patent/WO2022104774A1/zh
Priority to CN202080005159.6A priority patent/CN112740268B/zh
Publication of WO2022104774A1 publication Critical patent/WO2022104774A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Definitions

  • the present application relates to the field of intelligent driving or automatic driving, and in particular, to a target detection method and device.
  • Autonomous driving technology relies on the collaboration of computer vision, radar, monitoring devices, and global positioning systems to allow motor vehicles to drive autonomously without the need for active human operation.
  • Autonomous vehicles use various computing systems to help transport passengers from one location to another. Some autonomous vehicles may require some initial or continuous input from an operator, such as a pilot, driver, or passenger.
  • An autonomous vehicle permits the operator to switch from a manual mode of operation to a self-driving mode or a mode in between.
  • Object detection is an important research topic in autonomous driving.
  • cameras can be used to capture road conditions, objects such as obstacles, road signs, or vehicles in image data can be identified through object detection, and the category and location of objects can be obtained. Classes and locations, etc. to plan autonomous driving routes.
  • a possible realization of target recognition is: training a neural network model for outputting a two-dimensional rectangular frame, the two-dimensional rectangular frame can be used to represent the target obtained by recognizing image data, and after the processing device reads the image data from the camera, uses the The neural network model outputs a two-dimensional rectangular frame representing the target in the image data, and obtains the target recognition result.
  • the embodiments of the present application provide a target detection method and device, which relate to the fields of intelligent driving and automatic driving, and can obtain relatively accurate three-dimensional target detection results, which can comprehensively reflect the characteristics of the target.
  • an embodiment of the present application provides a target detection method, including: obtaining point cloud data by using image data; outputting first three-dimensional information of the point cloud data by using a first neural network model; the first three-dimensional information includes a method used to represent an image information of at least one first solid frame of at least one first target in the data, the information of the first solid frame includes first coordinates used to represent the position of the first solid frame, and the first solid frame is used to frame the first target; Utilize the second neural network model to output two-dimensional information of the image data; the two-dimensional information includes information used to represent at least one plane frame of at least one second object in the image data, and the information of the plane frame includes the position of the plane frame The plane frame is used to frame the second target; the second three-dimensional information is determined according to the depth information of the image data and the two-dimensional information of the image data; the second three-dimensional information includes at least one second target representing at least one second target in the image data Information of a second solid frame, the information of the second solid frame includes
  • the weight of the information used to represent the depth of the image data in the information is smaller than the weight of the information used to represent the image data plane in the second three-dimensional information.
  • the respective advantages of the first three-dimensional information and the second three-dimensional information can be combined to obtain a more accurate target detection result, and the target detection result is a three-dimensional result, which can more comprehensively reflect the characteristics of the target.
  • fusing the same target in the first three-dimensional information and the second three-dimensional information to obtain the target detection result includes: using a third neural network model to fuse the first three-dimensional information and the second three-dimensional information to obtain the target The detection result; wherein, in the loss function of the third neural network model, the weight of the information used to represent the depth of the image data in the first three-dimensional information is greater than the weight of the information used to represent the depth of the image data in the second three-dimensional information, and the weight of the information used to represent the depth of the image data in the first three-dimensional information The weight of the information representing the image data plane in the information is smaller than the weight of the information representing the image data plane in the second three-dimensional information. In this way, a relatively accurate target detection result can be obtained based on the respective advantages of the first three-dimensional information and the second three-dimensional information.
  • the loss function of the third neural network model is related to one or more of the following: the confidence level of the first neural network model, the confidence level of the second neural network model, the first neural network model The intersection ratio of the output results of the second neural network model and the real samples of the first neural network model, the intersection ratio between the output results of the second neural network model and the real samples of the second neural network model, the normalization of the data in the first neural network model Numerical or normalized numerical value of the data in the second neural network model. In this way, a more effective third neural network model can be obtained.
  • the first three-dimensional information includes (X 1 Y 1 Z 1 , W 1 H 1 L 1 ), X 1 Y 1 Z 1 is the first coordinate, and W 1 H 1 L 1 represents the first coordinate
  • the second three-dimensional information includes (X 2 Y 2 Z 2 , W 2 H 2 L 2 ), X 2 Y 2 Z 2 is the second coordinate, and W 2 H 2 L 2 represents the second solid frame
  • the loss function loss satisfies the following formula:
  • using the image data to obtain the point cloud data includes: performing three-dimensional reconstruction on the image data to obtain the point cloud data.
  • the point cloud data corresponding to the image can be easily obtained according to the image data by using 3D creation.
  • the image data is captured during the automatic driving process, and the second three-dimensional information is determined according to the depth information of the image data and the two-dimensional information of the image data, including: acquiring the images captured during the automatic driving process The adjacent image data of the data; the depth information of the image data is calculated by using the image data and the adjacent image data of the image data; the second three-dimensional information is obtained by fusing the depth information of the image data and the two-dimensional information of the image data. In this way, the depth information corresponding to the image can be conveniently obtained through the image and the adjacent information of the image.
  • the method further includes: updating the landmark element in the high-precision map according to the target detection result. In this way, a more real-time and accurate high-precision map can be obtained.
  • updating the landmark element in the high-precision map according to the target detection result includes: determining the landmark detection result used to represent the landmark in the target detection result; determining the location of the landmark detection result in the high-precision map; Add landmarks to the HD map based on the location of the landmark detection result in the HD map. In this way, a more real-time and accurate high-precision map can be obtained.
  • the method further includes: determining an automatic driving strategy according to the target detection result. In this way, the target detection result can be used to guide the automatic driving of the vehicle more accurately.
  • an embodiment of the present application provides a target detection apparatus.
  • the object detection device can be a vehicle with object detection function, or other components with object detection function.
  • the target detection device includes but is not limited to: vehicle-mounted terminal, vehicle-mounted controller, vehicle-mounted module, vehicle-mounted module, vehicle-mounted components, vehicle-mounted chip, vehicle-mounted unit or vehicle-mounted camera and other sensors, the vehicle can pass the vehicle-mounted terminal, vehicle-mounted controller, vehicle-mounted Modules, on-board modules, on-board components, on-board chips, on-board units or cameras, implement the methods provided in this application.
  • the target detection device may be an intelligent terminal, or be arranged in other intelligent terminals with target detection function other than the vehicle, or be arranged in a component of the intelligent terminal.
  • the intelligent terminal may be other terminal equipment such as intelligent transportation equipment, smart home equipment, and robots.
  • the target detection device includes, but is not limited to, a smart terminal or other sensors such as a controller, a chip or a camera, and other components in the smart terminal.
  • the target detection device may be a general-purpose device or a special-purpose device.
  • the apparatus can also be a desktop computer, a portable computer, a network server, a PDA (personal digital assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or other devices with processing functions.
  • PDA personal digital assistant
  • the embodiment of the present application does not limit the type of the target detection device.
  • the target detection device may also be a chip or processor with a processing function, and the target detection device may include at least one processor.
  • the processor can be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the chip or processor with processing function may be arranged in the sensor, or may not be arranged in the sensor, but arranged at the receiving end of the output signal of the sensor.
  • Processors include but are not limited to central processing unit (CPU), graphics processing unit (GPU), micro control unit (MCU), microprocessor (microprocessor unit, MPU), at least one of the coprocessors.
  • the target detection apparatus may also be a terminal device, or a chip or a chip system in the terminal device.
  • the target detection apparatus may include a processing unit and a communication unit.
  • the processing unit may be a processor.
  • the target detection apparatus may further include a storage unit, which may be a memory. The storage unit is used for storing instructions, and the processing unit executes the instructions stored in the storage unit, so that the terminal device implements a target detection method described in the first aspect or any possible implementation manner of the first aspect.
  • the processing unit may be a processor.
  • the processing unit executes the instructions stored in the storage unit, so that the terminal device implements a target detection method described in the first aspect or any possible implementation manner of the first aspect.
  • the storage unit may be a storage unit (eg, a register, a cache, etc.) in the chip, or a storage unit (eg, a read-only memory, a random access memory, etc.) located outside the chip in the terminal device.
  • the processing unit is used to obtain point cloud data by using the image data; the processing unit is also used to output the first three-dimensional information of the point cloud data by using the first neural network model; information of at least one first solid frame of at least one first target, the information of the first solid frame includes first coordinates used to represent the position of the first solid frame, and the first solid frame is used to frame the first target; , is also used to output the two-dimensional information of the image data by using the second neural network model; the two-dimensional information includes information used to represent at least one plane frame of at least one second target in the image data, and the information of the plane frame includes information used to represent The coordinates of the position of the plane frame; the plane frame is used to frame the second target; the processing unit is also used to determine the second three-dimensional information according to the depth information of the image data and the two-dimensional information of the image data; information of at least one second solid frame of at least one second target in the data, the information of the second solid frame includes second coordinates used to represent the position of the second solid frame, and
  • the processing unit is specifically configured to use the third neural network model to fuse the first three-dimensional information and the second three-dimensional information to obtain the target detection result; wherein, in the loss function of the third neural network model, the third The weight of the information used to represent the depth of the image data in the first three-dimensional information is greater than the weight of the information used to represent the depth of the image data in the second three-dimensional information, and the weight of the information used to represent the image data plane in the first three-dimensional information is smaller than that of the second three-dimensional information. The weight of the information used to represent the image data plane in the information.
  • the loss function of the third neural network model is related to one or more of the following: the confidence level of the first neural network model, the confidence level of the second neural network model, the first neural network model The intersection ratio of the output results of the second neural network model and the real samples of the first neural network model, the intersection ratio between the output results of the second neural network model and the real samples of the second neural network model, the normalization of the data in the first neural network model Numerical or normalized numerical value of the data in the second neural network model.
  • the first three-dimensional information includes (X 1 Y 1 Z 1 , W 1 H 1 L 1 ), X 1 Y 1 Z 1 is the first coordinate, and W 1 H 1 L 1 represents the first coordinate
  • the second three-dimensional information includes (X 2 Y 2 Z 2 , W 2 H 2 L 2 ), X 2 Y 2 Z 2 is the second coordinate, and W 2 H 2 L 2 represents the second solid frame
  • the loss function loss satisfies the following formula:
  • the processing unit is specifically configured to perform three-dimensional reconstruction on the image data to obtain point cloud data.
  • the processing unit is specifically configured to acquire adjacent image data of the image data captured in the process of automatic driving; use the image data and the adjacent image data of the image data to calculate the depth information of the image data; fusion; The depth information of the image data and the two-dimensional information of the image data are used to obtain second three-dimensional information.
  • the processing unit is further configured to update the landmark element in the high-precision map according to the target detection result.
  • the processing unit is specifically configured to determine the landmark detection result used to represent the landmark in the target detection result; determine the location of the landmark detection result in the high-precision map; according to the landmark detection result in the high-precision map location, and add landmarks in the high-definition map.
  • the processing unit is further configured to determine the automatic driving strategy according to the target detection result.
  • embodiments of the present application further provide a sensor system for providing a vehicle with a target detection function. It includes at least one target detection device mentioned in the above embodiments of the present application, and other sensors such as cameras. At least one sensor device in the system can be integrated into a complete machine or equipment, or at least one sensor device in the system can also be integrated. Can be set up independently as an element or device.
  • the embodiments of the present application further provide a system, which is applied in unmanned driving or intelligent driving, which includes at least one of the target detection device mentioned in the above-mentioned embodiments of the present application and at least one of sensors such as cameras and other sensors, At least one device in the system can be integrated into a whole machine or equipment, or at least one device in the system can also be independently set as a component or device.
  • any of the above systems may interact with the vehicle's central controller to provide detection and/or fusion information for vehicle driving decisions or control.
  • an embodiment of the present application further provides a terminal, where the terminal includes at least one target detection apparatus mentioned in the above-mentioned embodiments of the present application or any of the above-mentioned systems.
  • the terminal may be smart home equipment, smart manufacturing equipment, smart industrial equipment, smart transportation equipment (including drones, vehicles, etc.), and the like.
  • an embodiment of the present application further provides a chip, including at least one processor and an interface; the interface is used to provide program instructions or data for the at least one processor; the at least one processor is used to execute program line instructions to implement the first Aspect or any method of possible implementations of the first aspect.
  • an embodiment of the present application provides a target detection apparatus, including at least one processor configured to call a program in a memory to implement any method in the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a target detection apparatus, including: at least one processor and an interface circuit, where the interface circuit is configured to provide information input and/or information output for the at least one processor; at least one processor is configured to run code instructions to implement any method of the first aspect or any possible implementations of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores instructions, when the instructions are executed, to implement the first aspect or any of the possible implementations of the first aspect. a method.
  • FIG. 1 is a functional block diagram of a vehicle 100 according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a computer system 112 according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an automatic driving scenario provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a scenario in which an independent computing device is used for target detection according to an embodiment of the present application
  • FIG. 6 is a schematic flowchart of a target detection method provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of obtaining a target detection result according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a target detection apparatus provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another target detection apparatus provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a vehicle according to an embodiment of the application.
  • words such as “first” and “second” are used to distinguish the same or similar items that have basically the same function and effect.
  • the first value and the second value are only used to distinguish different values, and do not limit their order.
  • the words “first”, “second” and the like do not limit the quantity and execution order, and the words “first”, “second” and the like are not necessarily different.
  • At least one means one or more
  • plural means two or more.
  • And/or which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c may be single or multiple .
  • Target detection can find all objects of interest in image data or point cloud data.
  • target detection may include two subtasks of target localization and target classification, and the category and location of the target may be determined based on the two subtasks.
  • Possible implementations of target detection by electronic devices may include target detection based on cameras and target detection based on radar.
  • image data can be obtained by shooting with a camera, and then the electronic device can identify the target in the image data. Since the image data can more accurately reflect the size of the object, the plane information of the target obtained by recognizing the image data is relatively accurate. Therefore, in the usual implementation, when the camera is used to achieve target detection, the recognition image data can usually obtain one or more The information of the plane box, each plane rectangle box can frame a recognized target.
  • radar can be used to obtain 3D point cloud data, and electronic devices can identify targets in the 3D point cloud data. Because the radar can obtain accurate speed and position information, has a long field of view, and the depth information of the recognized target is relatively accurate, so in the usual implementation, when using radar to achieve target detection, to identify point cloud information, one or more can usually be obtained. Information of multiple solid frames, each solid frame can frame a recognized target, and the solid frame can reflect the depth information of the target.
  • the output target detection results usually have no depth information
  • the three-dimensional shape of the target cannot be comprehensively considered, so it may not be possible to obtain a more accurate image. good strategy.
  • radar-based target detection when the radar is located in an environment such as clutter or ground fluctuations, the radar will be affected by clutter or ground fluctuations, etc., resulting in inaccurate plane information drawn by point clouds. Therefore, radar-based target detection and identification Targets are less effective.
  • the embodiments of the present application provide a method for target detection, and the present application provides a method and device for target detection, which can obtain point cloud data by using image data, and use the first neural network model to output the first three-dimensional information of the point cloud data.
  • the information representing the depth in the first three-dimensional information is relatively accurate.
  • the second neural network model is used to output the two-dimensional information of the image data, the information representing the plane in the two-dimensional information is more accurate.
  • the two-dimensional information is converted into After being converted into the second three-dimensional information, the information representing the plane in the second three-dimensional information is also more accurate; when fusing the same target in the first three-dimensional information and the second three-dimensional information, let the first three-dimensional information be used to represent the image data
  • the weight of the depth information is greater than the weight of the information used to represent the depth of the image data in the second three-dimensional information, and the weight of the information used to represent the image data plane in the first three-dimensional information is smaller than the weight of the second three-dimensional information used to represent the image data plane.
  • the weight of the information can combine the respective advantages of the first three-dimensional information and the second three-dimensional information to obtain a more accurate target detection result, and the target detection result is a three-dimensional result, which can more comprehensively reflect the characteristics of the target.
  • the target detection method in the embodiment of the present application may be applied to scenarios such as automatic driving, security protection, or monitoring.
  • scenarios such as automatic driving, security protection, or monitoring.
  • objects such as obstacles can be detected based on the target detection method of the embodiment of the present application, and then an automatic driving strategy can be formulated based on the target detection results.
  • the target detection method of the embodiment of the present application can be used to detect targets such as people, and then based on the target detection, an alarm for unsafe factors such as illegal intrusion can be issued.
  • the target detection method in this embodiment of the present application may be applied to a vehicle, or a chip in a vehicle, or the like.
  • FIG. 1 shows a functional block diagram of a vehicle 100 provided by an embodiment of the present application.
  • the vehicle 100 is configured in a fully or partially autonomous driving mode.
  • the vehicle 100 may also determine the current state of the vehicle and its surrounding environment through human operation while in the autonomous driving mode, such as determining the possibility of at least one other vehicle in the surrounding environment behavior, and determine a confidence level corresponding to the likelihood that the other vehicle will perform the possible behavior, and control the vehicle 100 based on the determined information.
  • the vehicle 100 may be set to perform driving-related operations automatically without requiring human interaction.
  • Vehicle 100 may include various subsystems, such as travel system 102 , sensor system 104 , control system 106 , one or more peripherals 108 and power supply 110 , computer system 112 , and user interface 116 .
  • vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements. Additionally, each of the subsystems and elements of the vehicle 100 may be interconnected by wire or wirelessly.
  • the sensor system 104 may include several sensors that sense information about the environment surrounding the vehicle 100 .
  • the sensor system 104 may include a positioning system 122 (which may be a GPS system, a Beidou system or other positioning system), an inertial measurement unit (IMU) 124, a radar 126, a laser rangefinder 128, and camera 130.
  • the sensor system 104 may also include sensors of the internal systems of the vehicle 100 being monitored (eg, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, orientation, velocity, etc.). This detection and identification is a critical function for the safe operation of the autonomous vehicle 100 .
  • the positioning system 122 may be used to estimate the geographic location of the vehicle 100 .
  • the IMU 124 is used to sense position and orientation changes of the vehicle 100 based on inertial acceleration.
  • IMU 124 may be a combination of an accelerometer and a gyroscope.
  • Radar 126 may utilize radio signals to sense objects within the surrounding environment of vehicle 100 . In some embodiments, in addition to sensing objects, radar 126 may be used to sense the speed and/or heading of objects.
  • the laser rangefinder 128 may utilize laser light to sense objects in the environment in which the vehicle 100 is located.
  • the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
  • Camera 130 may be used to capture multiple images of the surrounding environment of vehicle 100 .
  • Camera 130 may be a still camera or a video camera.
  • Control system 106 controls the operation of the vehicle 100 and its components.
  • Control system 106 may include various elements including steering system 132 , throttle 134 , braking unit 136 , sensor fusion algorithms 138 , computer vision system 140 , route control system 142 , and obstacle avoidance system 144 .
  • Computer vision system 140 may be operable to process and analyze images captured by camera 130 in order to identify objects and/or features in the environment surrounding vehicle 100 .
  • Objects and/or features may include traffic signals, road boundaries, and obstacles.
  • Computer vision system 140 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, and other computer vision techniques.
  • SFM structure from motion
  • the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.
  • the route control system 142 is used to determine the travel route of the vehicle 100 .
  • route control system 142 may combine data from sensors 138 , global positioning system (GPS) 122 , and one or more predetermined maps to determine a route for vehicle 100 .
  • GPS global positioning system
  • the obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise traverse potential obstacles in the environment of the vehicle 100 .
  • control system 106 may additionally or alternatively include components other than those shown and described. Alternatively, some of the components shown above may be reduced.
  • Peripherals 108 may include a wireless communication system 146 , an onboard computer 148 , a microphone 150 and/or a speaker 152 .
  • peripherals 108 provide a means for a user of vehicle 100 to interact with user interface 116 .
  • the onboard computer 148 may provide information to the user of the vehicle 100 .
  • User interface 116 may also operate on-board computer 148 to receive user input.
  • the onboard computer 148 can be operated via a touch screen.
  • peripheral devices 108 may provide a means for vehicle 100 to communicate with other devices located within the vehicle.
  • microphone 150 may receive audio (eg, voice commands or other audio input) from a user of vehicle 100 .
  • speakers 152 may output audio to a user of vehicle 100 .
  • the display screen of the on-board computer 148 may also display the target tracked by the target detection algorithm according to the embodiment of the present application, so that the user can perceive the environment around the vehicle on the display screen.
  • Wireless communication system 146 may wirelessly communicate with one or more devices, either directly or via a communication network.
  • Computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as data storage device 114 .
  • Computer system 112 may also be multiple computing devices that control individual components or subsystems of vehicle 100 in a distributed fashion.
  • the processor 113 may be any conventional processor, such as a commercially available central processing unit (CPU). Alternatively, the processor may be a special-purpose device such as an application specific integrated circuit (ASIC) or other hardware-based processor for use in a specific application.
  • FIG. 1 functionally illustrates a processor, memory, and other elements of the computer system 112 in the same blocks, those of ordinary skill in the art will understand that the processor, computer, or memory may actually include a processor, a computer, or a memory that may or may not Multiple processors, computers, or memories that are not stored within the same physical enclosure.
  • a processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle while others are performed by a remote processor, including taking steps necessary to perform a single maneuver.
  • data storage 114 may include instructions 115 (eg, program logic) executable by processor 113 to perform various functions of vehicle 100 , including those described above.
  • Data storage 114 may also contain additional instructions, including sending data to, receiving data from, interacting with, and/or performing data processing on one or more of propulsion system 102 , sensor system 104 , control system 106 , and peripherals 108 . control commands.
  • the data storage device 114 may store data such as road maps, route information, the vehicle's position, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the computer system 112 during operation of the vehicle 100 in autonomous, semi-autonomous and/or manual modes.
  • a user interface 116 for providing information to or receiving information from a user of the vehicle 100 .
  • user interface 116 may include one or more input/output devices within the set of peripheral devices 108 , such as wireless communication system 146 , onboard computer 148 , microphone 150 and speaker 152 .
  • Computer system 112 may control functions of vehicle 100 based on input received from various subsystems (eg, travel system 102 , sensor system 104 , and control system 106 ) and from user interface 116 .
  • computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144 .
  • computer system 112 is operable to provide control of various aspects of vehicle 100 and its subsystems.
  • one or more of these components described above may be installed or associated with the vehicle 100 separately.
  • data storage device 114 may exist partially or completely separate from vehicle 100 .
  • the above-described components may be communicatively coupled together in a wired and/or wireless manner.
  • FIG. 1 should not be construed as a limitation on the embodiments of the present application.
  • An autonomous vehicle traveling on a road can track objects in its surroundings according to the object detection method of the embodiment of the present application to determine its own adjustment to the current speed or the driving route.
  • the object may be other vehicles, traffic control equipment, or other types of objects.
  • the computing device may also provide instructions to modify the steering angle of the vehicle 100 so that the self-driving car follows a given trajectory and/or maintains a close proximity to the self-driving car. Safe lateral and longitudinal distances to obstacles (eg, vehicles in adjacent lanes on the road).
  • obstacles eg, vehicles in adjacent lanes on the road.
  • the above-mentioned vehicle 100 can be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, a recreational vehicle, a playground vehicle, construction equipment, a tram, a golf cart, a train, a cart, etc.
  • the application examples are not particularly limited.
  • FIG. 2 is a schematic structural diagram of the computer system 112 in FIG. 1 .
  • computer system 112 includes processor 113 coupled to system bus 105 .
  • the processor 113 may be one or more processors, each of which may include one or more processor cores.
  • a video adapter 107 which can drive a display 109, is coupled to the system bus 105.
  • the system bus 105 is coupled to an input-output (I/O) bus through a bus bridge 111 .
  • I/O interface 115 is coupled to the I/O bus.
  • I/O interface 115 communicates with various I/O devices, such as input device 117 (eg, keyboard, mouse, touch screen, etc.), media tray 121, (eg, CD-ROM, multimedia interface, etc.).
  • Transceiver 123 (which can send and/or receive radio communication signals), camera 155 (which can capture still and moving digital video images) and external USB interface 125 .
  • the interface connected to the I/O interface 115 may be a universal serial bus (universal serial bus, USB) interface.
  • the processor 113 may be any conventional processor, including a reduced instruction set computing (“RISC”) processor, a complex instruction set computing (“CISC”) processor, or a combination thereof.
  • the processor may be a special purpose device such as an application specific integrated circuit (“ASIC").
  • the processor 113 may be a neural network processor or a combination of a neural network processor and the above-mentioned conventional processors.
  • the computer system may be located remotely from the autonomous vehicle and may communicate wirelessly with the autonomous vehicle.
  • some of the processes described herein are performed on a processor disposed within the autonomous vehicle, others are performed by a remote processor, including taking actions required to perform a single maneuver.
  • Network interface 129 is a hardware network interface, such as a network card.
  • the network 127 may be an external network, such as the Internet, or an internal network, such as an Ethernet network or a virtual private network (VPN).
  • the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and the like.
  • the hard drive interface 131 is coupled to the system bus 105 .
  • the hard disk drive interface 131 is connected to the hard disk drive 133 .
  • System memory 135 is coupled to system bus 105 .
  • Software running in system memory 135 may include an operating system (OS) 137 and application programs 143 of computer system 112 .
  • OS operating system
  • application programs 143 of computer system 112 .
  • the operating system includes a shell 139 and a kernel 141 .
  • the shell 139 is an interface between the user and the kernel of the operating system.
  • the shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system: waiting for user input, interpreting user input to the operating system, and processing various operating system output.
  • Kernel 141 consists of those parts of the operating system that manage memory, files, peripherals, and system resources. Interacting directly with hardware, the operating system's kernel 141 typically runs processes and provides inter-process communication, providing CPU time slice management, interrupts, memory management, IO management, and the like.
  • Application 141 includes programs related to controlling the autopilot of the car, for example, programs that manage the interaction of the autopilot car with obstacles on the road, programs that control the route or speed of the autopilot car, and programs that control the interaction between the autopilot car and other autopilot cars on the road .
  • Application 141 also exists on the system of software deploying server 149 .
  • the computer system may download the application 143 from the deploying server 149 when the application 141 needs to be executed.
  • Sensor 153 is associated with a computer system. Sensor 153 is used to detect the environment around computer system 112 .
  • the sensor 153 can detect animals, cars, obstacles and pedestrian crossings, etc. Further sensors can also detect the environment around the above-mentioned animals, cars, obstacles and pedestrian crossings, such as: the environment around animals, for example, animals appear around other animals, weather conditions, ambient light levels, etc.
  • the sensors may be cameras, infrared sensors, chemical detectors, microphones, and the like.
  • FIG. 3 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • the chip may include a neural network processor 30 .
  • the chip can be applied to the vehicle shown in FIG. 1 or the computer system shown in FIG. 2 .
  • the neural network processor 30 may be a neural network processing unit (NPU), a tensor processing unit (TPU), or a graphics processing unit (graphics processing unit, GPU), etc., all suitable for large-scale applications.
  • NPU neural network processing unit
  • TPU tensor processing unit
  • GPU graphics processing unit
  • a processor for XOR processing may be a processor for XOR processing.
  • the arithmetic circuit 303 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 303 is a general-purpose matrix processor.
  • Unified memory 306 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 302 through a storage unit access controller (direct memory access controller, DMAC) 305 .
  • Input data is also moved to unified memory 306 via the DMAC.
  • DMAC direct memory access controller
  • the bus interface unit (bus interface unit, BIU) 310 is used for the interaction between the DMAC and the instruction fetch buffer (instruction fetch buffer) 309; the bus interface unit 301 is also used for the instruction fetch memory 309 to obtain instructions from the external memory; the bus interface unit 301 also The memory cell access controller 305 acquires the original data of the input matrix A or the weight matrix B from the external memory.
  • the instruction fetch memory (instruction fetch buffer) 309 connected to the controller 304 is used to store the instructions used by the controller 304;
  • the unified memory 306, the input memory 301, the weight memory 302 and the instruction fetch memory 309 are all On-Chip memories. External memory is independent of the NPU hardware architecture.
  • FIG. 4 is a schematic diagram of an automatic driving scenario provided by an embodiment of the present application.
  • the autonomous vehicle 401 can detect surrounding objects according to its vehicle's sensors, such as cameras, radars, and the like. For example, the autonomous vehicle 401 may recognize other objects such as vehicles 4022, vehicles 4032, and speed limit signs 4042 around its vehicle.
  • the automatic driving vehicle 401 can use the camera to photograph surrounding objects, and the processor in the automatic driving vehicle can use the target detection method provided in the embodiment of the present application to identify the target of the object in the automatic driving scene, and obtain a One or more solid boxes corresponding to one or more targets, each solid box can frame the target recognized by the autonomous driving vehicle 401 .
  • the autonomous vehicle 401 may frame the vehicle 4022 by the solid frame 4021 , frame the vehicle 4032 by the solid frame 4031 , and frame the speed limit sign 4042 by the solid frame 4041 .
  • the subsequent automatic driving vehicle 401 can plan the automatic driving route according to the recognized target and other automatic driving data such as lane data, so as to ensure the normal driving of the automatic driving vehicle 401 .
  • FIG. 5 is a schematic diagram of a scene in which an independent computing device is used for target detection according to an embodiment of the present application.
  • the scenario may include: an autonomous vehicle 501, a wireless wide area network (wide area network, WAN) 502, a communication network 503 and a server 504.
  • WAN wide area network
  • the autonomous vehicle 501 may include one or more cameras, or devices such as wireless transceivers.
  • the wireless transceiver of the autonomous vehicle capable of exchanging data and communicating as needed with the wireless WAN 502 in this scenario.
  • the autonomous driving system in the autonomous vehicle 501 may use the wireless WAN 502 to transfer image data captured by a camera in the autonomous vehicle, or other data received by other sensors, via one or more communication networks 503 (eg, the Internet) It is transmitted to the server 504 for processing.
  • the server 504 then transmits the processed data to the automatic driving system of the automatic driving vehicle 501 for guiding the automatic driving of the vehicle.
  • the server 504 may be one or more servers.
  • the camera (or camera) described in the embodiments of this application can project an optical image generated by an object through a lens onto the surface of the image sensor, and then convert it into an electrical signal, which is converted into a digital image signal after digital-to-analog conversion , the digital image signal can be processed in a digital signal processing (digital signal processing, DSP) chip.
  • the camera may include a monocular camera, a binocular camera, and the like.
  • the point cloud data described in the embodiments of the present application may be: a set representing a set of vectors in a three-dimensional coordinate system. These vectors are usually represented in the form of three-dimensional coordinates (x, y, and z dimensions). Point cloud data is mainly used to represent the outer surface characteristics of the target object. Each point in the point cloud data contains three-dimensional coordinates. The size of the target and the depth information of the target can be obtained based on more point cloud data, and then combined with the plane information of the image data corresponding to the point cloud data, more accurate target detection results can be obtained.
  • the depth information described in this embodiment of the present application may include: indicating the distance from each point in the scene to the camera plane, which can reflect the geometric shape of the visible surface in the scene.
  • the three-dimensional representation of the object can be obtained by using the depth information and plane information of the object.
  • the neural network described in the embodiments of the present application may be a mathematical model or a computational model that imitates the structure and function of a biological neural network, and is used for estimating or approximating a function.
  • the neural network model needs to be trained with a large number of samples. After the model is trained, the neural network model can be used to make predictions.
  • the three-dimensional reconstruction (3D Reconstruction) described in the embodiments of the present application may include: a process of reconstructing three-dimensional information according to single-view or multi-view image data, and the three-dimensional reconstruction technology may describe the real scene or object into a mathematical model suitable for computer representation and processing , in order to process, manipulate and analyze it in a computer environment.
  • the focus of 3D reconstruction technology is to obtain the depth information of the target scene or object. Under the condition that the depth information of the scene is known, the three-dimensional reconstruction of the scene can be realized only through the registration and fusion of the point cloud data.
  • the reconstructed 3D model is usually relatively complete and has a high degree of authenticity, so it has been widely used.
  • the high definition map (high definition map, HD Map) described in the embodiments of the present application may be: machine-oriented map data for use by autonomous vehicles. It can more accurately describe the road traffic information elements and more realistically reflect the actual situation of the road. High-precision maps can achieve high-precision positioning functions, road-level and lane-level planning capabilities, and lane-level guidance capabilities.
  • FIG. 6 is a schematic flowchart of a target detection method provided by an embodiment of the present application. As shown in FIG. 6 , the method includes:
  • the image data may be a device or apparatus for executing the method of the embodiment of the present application, for example, obtained from a camera (or a camera).
  • the camera may periodically capture image data, and send the image data to a device or device for executing the method in the embodiments of the present application, and the device or device may use the image data to obtain a point cloud corresponding to the image data. data.
  • the device for executing the method in this embodiment of the present application may send an instruction to capture an image to the camera, and the camera may capture image data when receiving the instruction to capture an image, and send the image data to the device, which The device can use the image data to obtain point cloud data corresponding to the image data.
  • the number of image data may be one or more.
  • the method of obtaining point cloud data from image data can be adapted according to the actual application scene.
  • the image data is two-dimensional plane information
  • the point cloud data is three-dimensional stereo information.
  • the realization of using the image data to obtain the point cloud data may be, performing shape from shading (SFS) processing on the image data, and obtaining the surface of the object in the single image data.
  • FSS shape from shading
  • light-dark relationship obtain the relative height information or other parameter information of each pixel on the surface of the object according to the light-dark relationship; use the parameter information of the object in the image data and the plane information of the object in the image data; recover the three-dimensional information of a single image data; get The point cloud data corresponding to the image data.
  • the implementation of using the image data to obtain the point cloud data may be: acquiring a feature area in each image data (or referring to an area corresponding to an object in each image data); The corresponding relationship between the image data pairs is established according to the extracted feature regions; the three-dimensional stereo information corresponding to the image data is calculated by using the corresponding relationship between the image data pairs and the parameter information of the camera; the point cloud data corresponding to the image data is obtained.
  • the implementation of obtaining point cloud data by using image data may also include other contents according to the actual scene, and the method of obtaining point cloud data by using image data is not limited in the embodiments of the present application.
  • S602. Use the first neural network model to output the first three-dimensional information of the point cloud data.
  • the first three-dimensional information includes information used to represent at least one first solid frame of at least one first object in the image data, and the information of the first solid frame includes information used to represent the position of the first solid frame
  • the first coordinate of ; the first solid frame is used to frame the first coordinate.
  • the number of the first objects in the image data may be one or more, and each first object may correspond to a first three-dimensional frame.
  • the first neural network model is obtained by training according to the point cloud sample information
  • a possible implementation of using the point cloud sample data to train the first neural network model is: input point cloud sample data into the neural network model to be trained. , use the neural network model to be trained to output the predicted 3D information, and use the loss function to compare the gap between the predicted 3D information and the real 3D information.
  • the gap between the predicted 3D information output by the model and the real 3D information does not satisfy the loss function, adjust the model parameters, and continue training; until the gap between the predicted 3D information output by the model and the real 3D information satisfies the loss function, the model training ends, and the first neural network model that can identify point cloud data is obtained.
  • the point cloud data can be input into the first neural network model, and the information of the first solid frame that frames the first target can be output, and the information of the first solid frame includes the first solid frame used to represent the position of the first solid frame. coordinate.
  • the point cloud sample data may be obtained by labeling a certain number of point clouds.
  • the number of first objects in the first three-dimensional information identified by the first neural network model is related to the confidence of the first neural network model. For example, the higher the confidence of the first neural network model, the greater the number of first objects in the first three-dimensional information that can be output by the first neural network model, and the higher the accuracy of the identified first objects.
  • the manner of acquiring point cloud data is not limited in the embodiments of the present application.
  • the predicted three-dimensional information may be, using the neural network model to be trained, outputting information of a predicted solid frame capable of framing the point cloud target, and the information of the predicted solid frame includes a predicted value for indicating the position of the predicted solid frame three-dimensional coordinates.
  • the two-dimensional information includes information used to represent at least one plane frame of at least one second object in the image data, and the information of the plane frame includes coordinates used to indicate the position of the plane frame; the plane frame Used to frame the second target.
  • the number of second objects in the image data may be one or more, and each second object may correspond to a plane frame.
  • the second neural network model is obtained by training according to image sample data
  • a possible implementation of training the second neural network model according to image sample data is: inputting image sample data into the neural network model to be trained, and using the image sample data to be trained.
  • the trained neural network model outputs the predicted two-dimensional information, and uses the loss function to compare the gap between the predicted two-dimensional information and the real two-dimensional information.
  • adjust the The model parameters continue to be trained; until the difference between the predicted two-dimensional information output by the model and the real two-dimensional information satisfies the loss function, the model training ends, and a second neural network model capable of recognizing image data is obtained.
  • the image data can be input into the second neural network model, and the information of the plane box that defines the second target can be output, and the information of the plane box includes two-dimensional coordinates used to represent the position of the plane box.
  • the image sample data may be obtained by labeling image data captured by a camera, or may be obtained by labeling image data obtained in an image database.
  • the number of second objects in the two-dimensional information identified by the second neural network model is related to the confidence of the second neural network model. For example, the higher the confidence of the second neural network model, the greater the number of second objects in the second dimension information that can be output by the second neural network model, and the higher the accuracy of the identified second objects.
  • the predicted two-dimensional information may be information of a predicted plane frame capable of framing an image target by using a neural network model to be trained, and the predicted plane frame information includes a predicted two-dimensional frame used to represent the position of the predicted plane frame dimensional coordinates.
  • S604. Determine the second three-dimensional information according to the depth information of the image data and the two-dimensional information of the image data.
  • the second three-dimensional information includes information of at least one second solid frame used to represent at least one second object in the image data
  • the information of the second solid frame includes information used to represent the second solid frame
  • the second coordinate of the position, the second solid frame is used to frame the second target.
  • converting the two-dimensional information of the image data into the three-dimensional information may be determining the second three-dimensional information corresponding to the image data according to the depth information of the image data and the two-dimensional information of the image data.
  • the plane information in the two-dimensional information of the image data is relatively accurate; therefore, the depth information of the image data and the two-dimensional information of the image data will be used,
  • the three-dimensional information corresponding to the obtained image data is determined, and the plane information in the three-dimensional information is also relatively accurate.
  • the information used to represent the depth of the image data (or called depth information) in the first three-dimensional information is more weighted than the second three-dimensional information.
  • the weight of the information used to represent the depth of the image data in the three-dimensional information is smaller than the weight of the information used to represent the image data plane in the second three-dimensional information the weight of.
  • the same target indicates that the first target in the first three-dimensional information and the second target in the second three-dimensional information are targets of the same object.
  • the number of identical targets may be one or more, and each identical target includes a first target and a second target.
  • the first target and the second target in the first three-dimensional information can be determined by using the overlap ratio (or the intersection ratio) of the first target in the first three-dimensional information and the second target in the second three-dimensional information.
  • the second object in the three-dimensional information is the same object.
  • the overlap ratio is greater than or equal to the set threshold.
  • the more overlapping parts between the first object in the first three-dimensional information and the second object in the second three-dimensional information it can indicate that the first object in the first three-dimensional information and the second object in the second three-dimensional information overlap.
  • the second target in the 2D and 3D information points to the same target. Therefore, when the overlap ratio of the first object in the first three-dimensional information and the second object in the second three-dimensional information is greater than or equal to the threshold, the same object in the first three-dimensional information and the second three-dimensional information can be determined.
  • the number of first objects in the first three-dimensional information when the number of first objects in the first three-dimensional information is one, and the number of second objects in the second three-dimensional information is one, the number of the first object and the second object may be When the overlap ratio is greater than the threshold, the same object in the first three-dimensional information and the second three-dimensional information is determined.
  • first three-dimensional when the number of first objects in the first three-dimensional information is multiple and the number of second objects in the second three-dimensional information is multiple, one first three-dimensional
  • the first target in the information and the second target in a second three-dimensional information are paired, and the overlap ratio of each pair of the first target and the second target is calculated separately, and the overlap ratio is greater than or equal to the threshold for each pair.
  • the first target and the second target are determined to be the same target, and the same target in the first three-dimensional information and the second three-dimensional information is obtained.
  • the overlap ratio of the first target in the first three-dimensional information and the second target in the second three-dimensional information is less than the threshold, it is considered that the first target in the first three-dimensional information and the second three-dimensional information are considered.
  • the second target in the information does not correspond to the same target.
  • the target detection result may be three-dimensional information of the target, and optional information used to indicate the type, position, size, or speed of the target, and the first three-dimensional information and the second three-dimensional information contained in the target detection result.
  • the same target in the information may be one or more, and the embodiment of the present application does not limit the specific content and quantity of the target detection result.
  • the first target, the second target or the target detection result may be the data required for automatic driving such as the identified vehicle, pedestrian, road sign or obstacle.
  • the process of fusing the same target in the first three-dimensional information and the second three-dimensional information may be: each dimension information (including the x dimension, the y dimension and the z dimension) in the first three-dimensional information and the second three-dimensional information.
  • the weight is given, and the target detection result is obtained by using the weight of each dimension information in the two three-dimensional information.
  • the weight of the information (z dimension) used to represent the depth of the image data in the first three-dimensional information is greater than the weight of the information used to represent the depth of the image data in the second three-dimensional information, and the weight of the information used to represent the image data plane in the first three-dimensional information
  • the weight of the information (x and y dimensions) is smaller than the weight of the information used to represent the image data plane in the second three-dimensional information.
  • the method for fusing the first three-dimensional information and the second three-dimensional information may use a neural network model, a mathematical model, a statistical method, or other methods used according to different actual scenarios.
  • the methods of the three-dimensional information and the second three-dimensional information are not limited.
  • the first three-dimensional information is the three-dimensional information of the first target output by using the first neural network model, and the depth information in the first three-dimensional information is relatively accurate;
  • the second three-dimensional information is the converted two-dimensional information.
  • the plane information in the second three-dimensional information is relatively accurate.
  • the weight of the information used to represent the depth of the image data in the first three-dimensional information is relatively large; in order to ensure the accuracy of the plane information in the second three-dimensional information , the weight of the information used to represent the image data plane in the second three-dimensional information is relatively large, so the target detection result obtained by fusion can combine the advantageous information of the two three-dimensional information at the same time, and a relatively accurate target detection result can be obtained.
  • S606 includes: using a third neural network model to fuse the same target in the first three-dimensional information and the second three-dimensional information to obtain a target detection result; wherein, the third In the loss function of the neural network model, the weight of the information used to represent the depth of the image data in the first three-dimensional information is greater than the weight of the information used to represent the depth of the image data in the second three-dimensional information, and the first three-dimensional information is used to represent the image data.
  • the weight of the plane information is smaller than the weight of the information representing the image data plane in the second three-dimensional information.
  • a possible implementation of the third neural network model is to input two kinds of sample data into the neural network model to be trained, and the two kinds of sample data can be respectively: data output from the first neural network model The sample data obtained by labeling, and the sample data obtained by labeling the data generated by superimposing the depth information on the data output by the second neural network model; using the neural network model to be trained to output the fusion result of the prediction of the two sample data; using the loss
  • the function compares the difference between the predicted fusion result and the real result. When the difference between the predicted fusion result output by the model and the real result does not satisfy the loss function, adjust the model parameters and continue training until the predicted fusion result output by the model matches the real result.
  • the model training ends; a third neural network model that can output more accurate target detection results is obtained. Subsequently, the same target in the first three-dimensional information and the second three-dimensional information can be input into the third neural network model, and the target detection result after fusing the same target in the two three-dimensional information can be output.
  • FIG. 7 shows a schematic flowchart of obtaining a target detection result.
  • the first three-dimensional information output by the first neural network model and the second three-dimensional information obtained by three-dimensional transformation of the two-dimensional information output by the second neural network model are input into the third In the neural network model, the target detection result can be output by using the third neural network model.
  • the first three-dimensional information includes (X 1 Y 1 Z 1 , W 1 H 1 L 1 ), X 1 Y 1 Z 1 is the first coordinate, and W 1 H 1 L 1 represents the length of the first solid frame Width and height;
  • the second three-dimensional information includes (X 2 Y 2 Z 2 , W 2 H 2 L 2 ), X 2 Y 2 Z 2 is the second coordinate, and W 2 H 2 L 2 represents the length, width and height of the second solid frame .
  • the loss function loss can satisfy the following formula:
  • is the weight of the first three-dimensional information representing plane information (X 1 Y 1 , W 1 H 1 ) and the second three-dimensional information representing plane information (X 2 Y 2 , W 2 H 2 ); ⁇ is the first three-dimensional information
  • the information represents the depth information (Z 1 , L 1 ) and the weight representing the depth information (Z 2 , L 2 ) in the second three-dimensional information.
  • the value of ⁇ in the second three-dimensional information when the value of ⁇ in the second three-dimensional information is in the range of (0.5, 1), the value of 1- ⁇ in the corresponding first three-dimensional information is in the range of (0, 0.5) In this case, it means that the plane information (X 2 Y 2 , W 2 H 2 ) in the second three-dimensional information occupies a relatively high weight, which further indicates that the accuracy of the plane information in the second three-dimensional information is high; when When the value of ⁇ in the first three-dimensional information is in the range of (0.5, 1), the value of 1- ⁇ in the corresponding second three-dimensional information is in the range of (0, 0.5), which indicates that the first three-dimensional information is in the range of (0, 0.5). The weight of the depth information (Z 1 , L 1 ) in the first three-dimensional information is relatively high, which further indicates that the accuracy of the depth information in the first three-dimensional information is relatively high.
  • the loss function of the third neural network model is related to one or more of the following: the confidence level of the first neural network model, the Confidence, the intersection ratio between the output result of the first neural network model and the real sample of the first neural network model, the intersection ratio between the output result of the second neural network model and the real sample of the second neural network model, the first neural network model The normalized value of the data in the network model or the normalized value of the data in the second neural network model.
  • the confidence level of the first neural network model may be the accuracy rate of the first three-dimensional information predicted by using the first neural network model output; the confidence level of the second neural network model may be, using the second neural network model The accuracy of the 2D information predicted by the model output.
  • a neural network model is used to identify a vehicle, and the confidence level indicates the accuracy of classifying an object as a vehicle by using the neural network model.
  • the loss function of the third neural network model is related to the confidence C i of the first neural network model and the confidence C p of the second neural network model
  • the loss function loss satisfies the following formula:
  • the intersection ratio between the output result of the first neural network and the real sample of the first neural network model can be understood as, using the predicted three-dimensional frame of the target that can be output by the first neural network model, the real sample can pass The real cube frame frames the target in the sample, and the intersection ratio represents the overlap ratio between the predicted cube frame and the real cube frame; the intersection ratio between the output result of the second neural network and the real sample of the second neural network model can be understood
  • the target in the sample can be framed by the real plane frame in the real sample, and the intersection ratio represents the overlap ratio of the predicted plane frame and the real plane frame.
  • the loss function of the third neural network model is the intersection ratio IoU i between the output result of the first neural network model and the real sample of the first neural network model; the output result of the second neural network model is the same as that of the second neural network model.
  • the loss function loss satisfies the following formula:
  • the normalized value of the data in the first neural network model is expressed as a value obtained by normalizing the point cloud data input in the first neural network model, so that the point cloud data can be mapped to a specific
  • the normalized value of the data in the second neural network model is expressed as the value of normalizing the image data input in the second neural network model, so that the image data can be mapped into a specific interval.
  • the loss function of the third neural network model is related to the normalized value E i of the data in a neural network model and the normalized value E p of the data in the second neural network model, the loss function loss satisfies the following formula:
  • the loss function and the third neural network model can be used to obtain more accurate target detection results.
  • S601 includes: performing three-dimensional reconstruction on image data to obtain point cloud data.
  • the three-dimensional reconstruction step may include but is not limited to the following steps:
  • the camera may be a binocular camera, and the binocular camera may collect image pair data from different angles, and transmit the data to the processing device of the autonomous vehicle through the bus interface for processing.
  • the image data may be processed by a processing device (such as a processor) of the autonomous vehicle, or the image data captured by the camera may be processed by an independent computing device on the server.
  • multiple pieces of calibration plate image data can be collected, and the inner area of the calibration plate can be found through threshold segmentation; the edge of each dot of the calibration plate can be obtained through a sub-pixel edge extraction method.
  • obtain the circle center coordinates of the dots through least squares circle fitting, determine the correspondence between the circle center coordinates and their projections in the image data, and the approximate positional relationship between the calibration plate and the camera, which is the initial value of the camera's external parameters; Call the Halcon library function to determine the internal parameters and external parameters of the two cameras and the relative positional relationship between the two cameras; determine the relevant parameter information of the binocular camera by taking the average of multiple measurements.
  • the image data obtained from the binocular camera is corrected to a standard epipolar geometry, and the camera parameters of the corrected binocular camera are obtained.
  • the preprocessing may include: converting the corrected color image data pairs into grayscale image data through a weighted average method; performing histogram equalization operation processing, so that the grayscale distribution of the image data tends to be average, and the The grayscale spacing of the occupied pixels is widened to increase the contrast of image data to achieve the purpose of image data enhancement; global threshold segmentation to obtain areas of interest in image data.
  • the parallax search space is estimated by the epipolar constraint and the distance between the binocular camera and the target object, thereby reducing the search range of matching; the introduction of coarse grid series through multi-grid technology accelerates the convergence of partial differential equations and improves matching. Speed; through fine grid iteration, the residuals are restricted from the finest grid to the coarse grid in turn, and the similarity judgment criterion combining pixel grayscale, gradient and smoothness is used to search in the coarse grid search space Match the points to obtain the disparity value; extend the disparity value obtained from the coarse grid to the fine grid in turn, and obtain the disparity value of the final matching point through combined correction; traverse the entire image data according to the above steps, until Get a complete continuous disparity map.
  • the three-dimensional space coordinates of each point of the image data are obtained, and the point cloud data corresponding to the image data is obtained; point cloud data.
  • the method for obtaining point cloud data by 3D reconstruction of image data may also include other contents according to the actual application scenario, and other methods that can perform 3D reconstruction on image data to obtain point cloud data are not limited in the embodiments of the present application. .
  • a monocular camera may be used to capture a series of continuous image data
  • S604 includes: acquiring the image data of the image data captured during the automatic driving process. adjacent image data; using the image data and adjacent image data of the image data to calculate the depth information of the image data; fusing the depth information of the image data and the two-dimensional information of the image data to obtain the second three-dimensional information.
  • the monocular camera may be a camera with an ordinary RGB camera, which can capture continuous image data. It can be understood that the monocular camera may also be a shooting device capable of continuous shooting, such as a camera built in a vehicle, and the type of the monocular camera is not limited in the embodiment of the present application.
  • a plurality of image data captured by a monocular camera can be used, taking any image data and adjacent image data of the image data as an example, to calculate the depth information of the image data, and the process can include:
  • the focal length and depth of field information of each image data in the image data and the adjacent image data of the image data are obtained; Carry out comparison and matching, and establish a mapping relationship for the same pixel displayed in the image data; Other methods calculate the exact distance between the pixel and the camera lens; clear pixels obtained by comparing and matching image data and adjacent image data of the image data, as well as the exact distance between the pixel and the camera lens, to obtain image data
  • the depth information of each pixel in the image data can be obtained by subsequent fusion of the two-dimensional information of the image data and the depth information of the image data to obtain the second three-dimensional information.
  • the depth information can be fitted by using the difference and other conversion processing algorithms.
  • the depth-of-field information indicates that, during the focusing process of the captured image data, clear images will be imaged on the focal plane through the lens, and the obtained information includes the clear interval of the image data.
  • the fusion of the depth information of the image data and the two-dimensional information of the image data may be as follows: each pixel in the two-dimensional information of the image data is assigned depth information corresponding to the pixel to obtain the second three-dimensional information.
  • the landmark element in the high-precision map is updated according to the target detection result.
  • the high-precision map can not only describe the road, but also reflect the conditions of vehicles included in each road, and can more truly reflect the actual style of the road during driving.
  • the landmark elements in the high-precision map may be elements that identify geographic locations, and the landmark elements may include: lane line elements between each lane and lanes, road sign elements, street light elements, obstacle elements, greening A landmark element, such as a band element, that identifies the geographic location of an object.
  • the high-definition map can also include richer road information elements, such as: road shape and data on the slope, curvature, and roll of each lane, the style of each lane and the lane lines between lanes, the road road information elements required for autonomous driving, such as the height limit, the content of arrows and text on the road, etc.
  • the landmark element of the corresponding position in the high-precision map can be updated by comparing it with the position where the target detection result is located in the high-precision map.
  • updating the landmark element in the high-precision map according to the target detection result includes: determining the landmark detection result used to represent the landmark in the target detection result; determining the landmark detection result Location in the high-precision map; add landmarks in the high-precision map according to the location of the landmark detection result in the high-precision map.
  • determining the landmark detection result representing the landmark in the target detection result may be, in the target detection result obtained in the embodiment of the present application, identifying the landmark detection result representing the landmark.
  • the method for recognizing the landmark detection result may be: training a neural network model capable of recognizing the landmark detection result, and using the neural network model to identify the object in the target detection result as the landmark detection result of the landmark.
  • the landmark detection result may include other types of landmark information such as location information of the landmark and size information of the landmark.
  • determining the location of the landmark detection result in the high-precision map may be: determining the location information of the landmark detection result, comparing the location information of the landmark detection result with the location information in the high-precision map, and obtaining the landmark detection result The specific location in the high-precision map.
  • adding a landmark to the high-precision map may be: if the location of the landmark detection result in the high-precision map does not contain the landmark element, then the The landmark detection result is added to the high-precision map; if the landmark detection result contains a landmark element at the location in the high-precision map, and the landmark in the high-precision map is different from the landmark result detected in the embodiment of the present application, this method can be used.
  • the landmark detection result of the application embodiment replaces the landmark in the high-precision map.
  • a possible implementation manner further includes: determining an automatic driving strategy according to the target detection result.
  • the automatic driving strategy may be a manner of instructing the operation of the automatic driving vehicle.
  • the high-precision map can be used as an important reference for determining the automatic driving strategy in the automatic driving process.
  • the automatic driving strategy may include: instructing the automatic driving vehicle to turn, change lanes, change gears, yield to other vehicles or pedestrians, and other automatic driving strategies.
  • the automatic driving vehicle when the automatic driving vehicle detects that the target detection result in the current scene meets the requirement that the vehicle can change lanes, the automatic driving vehicle will instruct the automatic driving vehicle to execute the lane change strategy based on the real-time changes of the vehicle in the automatic driving scene. . If there is a sudden deceleration vehicle in front of the self-driving vehicle while the current self-driving vehicle is executing the lane-changing strategy, so that the self-driving vehicle cannot continue to meet its own lane-changing needs, the self-driving vehicle can give up the lane-changing strategy. Continue driving within this lane line.
  • the self-driving vehicle executes the yield strategy, stops driving, waits for the pedestrian to cross the road, and waits until the pedestrian crosses the road. , the autonomous vehicle continues to drive.
  • the automatic driving strategy also includes other contents according to different actual scenarios, which are not limited in this embodiment of the present application.
  • the target detection result can identify the complex road conditions in the current autonomous driving scene, and the autonomous vehicle can perform autonomous driving more accurately based on the target detection result.
  • the above implementing devices include hardware structures and/or software units corresponding to executing the functions.
  • the present application can be implemented in hardware or a combination of hardware and computer software with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
  • FIG. 8 is a target detection apparatus provided by an embodiment of the present application, and the target detection apparatus includes a processor 800 and a memory 801 .
  • the processor 800 is responsible for managing the bus architecture and general processing, and the memory 801 may store data used by the processor 800 when performing operations.
  • the bus architecture may include any number of interconnected buses and bridges, in particular one or more processors represented by processor 800 and various circuits of memory represented by memory 801 linked together.
  • the bus architecture may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and, therefore, will not be described further herein.
  • the bus interface provides the interface.
  • the processor 800 is responsible for managing the bus architecture and general processing, and the memory 801 may store data used by the processor 800 when performing operations.
  • the processes disclosed in the embodiments of the present application may be applied to the processor 800 or implemented by the processor 800 .
  • each step of the flow of target detection can be completed by an integrated logic circuit of hardware in the processor 800 or an instruction in the form of software.
  • the processor 800 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the embodiments of the present application.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 801, and the processor 800 reads the information in the memory 801, and completes the steps of the signal processing flow in combination with its hardware.
  • the processor 800 is configured to read the program in the memory 801 and execute the method flow in S601-S605 shown in FIG. 6 .
  • FIG. 9 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • Chip 900 includes one or more processors 901 and interface circuits 902 .
  • the chip 900 may further include a bus 903 . in:
  • the processor 901 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the processor 901 or an instruction in the form of software.
  • the above-mentioned processor 901 may be a general purpose processor, a digital communicator (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components , MCU, MPU, CPU or one or more of coprocessors.
  • DSP digital communicator
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the interface circuit 902 can be used for sending or receiving data, instructions or information.
  • the processor 901 can use the data, instructions or other information received by the interface circuit 902 to process, and can send the processing completion information through the interface circuit 902.
  • the chip further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor.
  • a portion of the memory may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory stores executable software modules or data structures
  • the processor may execute corresponding operations by calling operation instructions stored in the memory (the operation instructions may be stored in the operating system).
  • the chip may be used in the target detection apparatus involved in the embodiments of the present application.
  • the interface circuit 902 may be used to output the execution result of the processor 901 .
  • processor 901 and the interface circuit 902 can be implemented by hardware design, software design, or a combination of software and hardware, which is not limited here.
  • an embodiment of the present application provides an apparatus for target detection, where the apparatus includes at least one processing unit 1001 .
  • the embodiment of the present application also provides an electronic device.
  • the electronic device includes: a processing unit configured to support the target detection apparatus to perform the steps in the above embodiments, for example, S601 may be performed.
  • S601 may be performed.
  • the target detection device includes but is not limited to the unit modules listed above.
  • the specific functions that can be implemented by the above functional units also include but are not limited to the functions corresponding to the method steps described in the above examples.
  • the detailed description of other units of the electronic device please refer to the detailed description of the corresponding method steps. This application implements Examples are not repeated here.
  • the electronic device involved in the above embodiments may include: a processing unit, a storage unit, and a communication unit.
  • the storage unit is used to store the program codes and data of the electronic device.
  • the communication unit is used to support the communication between the electronic device and other network entities, so as to realize the functions of the electronic device's call, data interaction, Internet access and so on.
  • the processing unit is used to control and manage the actions of the electronic device.
  • the processing unit may be a processor or a controller.
  • the communication unit may be a transceiver, an RF circuit or a communication interface or the like.
  • the storage unit may be a memory.
  • the electronic device may further include an input unit and a display unit.
  • the display unit may be a screen or a display.
  • the input unit may be a touch screen, a voice input device, or a fingerprint sensor.
  • the processing unit 1001 is configured to obtain point cloud data by using image data; the processing unit is further configured to output the first three-dimensional information of the point cloud data by using the first neural network model; the first three-dimensional information includes image data representing the image data.
  • the processing unit 1001 is further configured to use the second neural network model to output two-dimensional information of the image data; the two-dimensional information includes information used to represent at least one plane frame of at least one second target in the image data, and the information of the plane frame includes The coordinates used to represent the position of the plane box; the plane box is used to frame the second target.
  • the processing unit 1001 is further configured to determine second three-dimensional information according to the depth information of the image data and the two-dimensional information of the image data; the second three-dimensional information includes at least one second solid frame used to represent at least one second object in the image data
  • the information of the second solid frame includes second coordinates used to represent the position of the second solid frame, and the second solid frame is used to frame the second target.
  • the processing unit 1001 is further configured to fuse the same target in the first three-dimensional information and the second three-dimensional information to obtain a target detection result; wherein, in the process of fusing the same target in the first three-dimensional information and the second three-dimensional information, the first three-dimensional
  • the weight of the information used to represent the depth of the image data in the 3D information is greater than the weight of the information used to represent the depth of the image data in the second 3D information, and the weight of the information used to represent the image data plane in the first 3D information is smaller than that of the second 3D information.
  • the weights used to represent the information in the image data plane is further configured to fuse the same target in the first three-dimensional information and the second three-dimensional information to obtain a target detection result; wherein, in the process of fusing the same target in the first three-dimensional information and the second three-dimensional information, the first three-dimensional
  • the weight of the information used to represent the depth of the image data in the 3D information is greater than the weight of the information used to represent the
  • the processing unit is specifically configured to use the third neural network model to fuse the first three-dimensional information and the second three-dimensional information to obtain the target detection result; wherein, in the loss function of the third neural network model, the third The weight of the information used to represent the depth of the image data in the first three-dimensional information is greater than the weight of the information used to represent the depth of the image data in the second three-dimensional information, and the weight of the information used to represent the image data plane in the first three-dimensional information is smaller than that of the second three-dimensional information. The weight of the information used to represent the image data plane in the information.
  • the loss function of the third neural network model is related to one or more of the following: the confidence level of the first neural network model, the confidence level of the second neural network model, the first neural network model The intersection ratio of the output results of the second neural network model and the real samples of the first neural network model, the intersection ratio between the output results of the second neural network model and the real samples of the second neural network model, the normalization of the data in the first neural network model Numerical or normalized numerical value of the data in the second neural network model.
  • the first three-dimensional information includes (X 1 Y 1 Z 1 , W 1 H 1 L 1 ), X 1 Y 1 Z 1 is the first coordinate, and W 1 H 1 L 1 represents the first coordinate
  • the second three-dimensional information includes (X 2 Y 2 Z 2 , W 2 H 2 L 2 ), X 2 Y 2 Z 2 is the second coordinate, and W 2 H 2 L 2 represents the second solid frame
  • the loss function loss satisfies the following formula:
  • the processing unit is specifically configured to perform three-dimensional reconstruction on the image data to obtain point cloud data.
  • the processing unit is specifically configured to acquire adjacent image data of the image data captured in the process of automatic driving; use the image data and the adjacent image data of the image data to calculate the depth information of the image data; fusion; The depth information of the image data and the two-dimensional information of the image data are used to obtain second three-dimensional information.
  • the processing unit is further configured to update the landmark element in the high-precision map according to the target detection result.
  • the processing unit is specifically configured to determine the landmark detection result used to represent the landmark in the target detection result; determine the location of the landmark detection result in the high-precision map; according to the landmark detection result in the high-precision map location, and add landmarks in the high-definition map.
  • the processing unit is further configured to determine the automatic driving strategy according to the target detection result.
  • the present application provides a vehicle, the device includes at least one camera 1101 , at least one memory 1102 , and at least one processor 1103 .
  • the camera 1101 is used to acquire an image, and the image is used to obtain the detection result of the camera object.
  • the memory 1102 is used to store one or more programs and data information; wherein the one or more programs include instructions.
  • the processor 1103 is configured to obtain point cloud data by using the image data; output the first three-dimensional information of the point cloud data by using the first neural network model; the first three-dimensional information includes at least one object used to represent at least one first target in the image data
  • the information of the first three-dimensional frame, the information of the first three-dimensional frame includes the first coordinates used to represent the position of the first three-dimensional frame, and the first three-dimensional frame is used to frame the first target; the second neural network model is used to output the second image data.
  • the two-dimensional information includes information used to represent at least one plane frame of at least one second target in the image data, and the information of the plane frame includes coordinates used to indicate the position of the plane frame;
  • the plane frame is used to frame the second target Determine the second three-dimensional information according to the depth information of the image data and the two-dimensional information of the image data;
  • the information of the frame includes second coordinates used to represent the position of the second solid frame, and the second solid frame is used to frame the second target; the same target in the first three-dimensional information and the second three-dimensional information is fused to obtain the target detection result; wherein , in the process of fusing the same target in the first three-dimensional information and the second three-dimensional information, the weight of the information used to represent the depth of the image data in the first three-dimensional information is greater than the weight of the information used to represent the depth of the image data in the second three-dimensional information Weight, the weight of the information used to represent the image data plane in the first three-dimensional information is smaller than the weight of the information used to represent the
  • various aspects of the target detection method provided by the embodiments of the present application may also be implemented in the form of a program product, which includes program code.
  • program code runs on a computer device
  • the program code uses The steps in the method for object detection according to various exemplary embodiments of the present application described in this specification are used to cause a computer device.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a program product for object detection may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may run on a server device.
  • CD-ROM portable compact disc read only memory
  • the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be transmitted by communication, used by an apparatus or device, or used in combination therewith.
  • a readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport a program for use by or in connection with a periodic network action system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for performing the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming Language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device.
  • LAN local area network
  • WAN wide area network
  • the embodiments of the present application further provide a storage medium readable by a computing device for the target detection method, that is, the content is not lost after power is turned off.
  • a software program is stored in the storage medium, including program code.
  • the software program can implement any one of the above target detections in the embodiments of the present application when the software program is read and executed by one or more processors. plan.
  • the present application may also be implemented in hardware and/or software (including firmware, resident software, microcode, etc.). Still further, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by an instruction execution system or Used in conjunction with an instruction execution system.
  • a computer-usable or computer-readable medium can be any medium that can contain, store, communicate, transmit, or transmit a program for use by, or in connection with, an instruction execution system, apparatus, or device. device or equipment use.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Traffic Control Systems (AREA)

Abstract

本申请实施例提供一种目标检测方法和装置,涉及智能驾驶或者自动驾驶领域,包括:利用图像数据得到点云数据;利用第一神经网络模型输出点云数据的第一三维信息;利用第二神经网络模型输出图像数据的二维信息;根据图像数据的深度信息以及图像数据的二维信息确定第二三维信息;融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果。这样,可以结合第一三维信息和第二三维信息各自的优势,得到较为准确的目标检测结果,且该目标检测结果是三维结果,能更全面的反映目标的特征。

Description

目标检测方法和装置 技术领域
本申请涉及智能驾驶或者自动驾驶领域,尤其涉及一种目标检测方法和装置。
背景技术
信息技术的发展给人们的生活带来很多便利,自动驾驶技术也在人工智能和汽车行业的带领下逐渐成为业界焦点。自动驾驶技术依靠计算机视觉、雷达、监控装置和全球定位系统等协同合作,让机动车辆可以在不需要人类主动操作下,实现自动驾驶。自动驾驶的车辆使用各种计算系统来帮助将乘客从一个位置运输到另一位置。一些自动驾驶车辆可能要求来自操作者(诸如,领航员、驾驶员、或者乘客)的一些初始输入或者连续输入。自动驾驶车辆准许操作者从手动模操作式切换到自东驾驶模式或者介于两者之间的模式。由于自动驾驶技术无需人类来驾驶机动车辆,所以理论上能够有效避免人类的驾驶失误,减少交通事故的发生,且能够提高公路的运输效率。因此,自动驾驶技术越来越受到重视。目标检测是自动驾驶的重要研究课题。例如,在自动驾驶中,可以利用相机捕捉路况状态,通过目标检测识别图像数据中的障碍物、路标或车辆等物体,得到物体的类别和位置等,进而自动驾驶车辆可以基于识别到的物体的类别和位置等规划自动驾驶路线。
目标识别的一种可能实现为:训练用于输出二维矩形框的神经网络模型,该二维矩形框可以用于表示识别图像数据得到的目标,处理设备从相机读入图像数据后,利用该神经网络模型输出表示该图像数据中的目标的二维矩形框,得到目标识别结果。
然而上述的目标识别过程中,目标检测不够准确。
发明内容
本申请实施例提供一种目标检测方法和装置,涉及智能驾驶和自动驾驶领域,可以得到较为准确的三维目标检测结果,能全面的反映目标的特征。
第一方面,本申请实施例提供一种目标检测方法,包括:利用图像数据得到点云数据;利用第一神经网络模型输出点云数据的第一三维信息;第一三维信息包括用于表示图像数据中的至少一个第一目标的至少一个第一立体框的信息,第一立体框的信息包括用于表示第一立体框的位置的第一坐标,第一立体框用于框定第一目标;利用第二神经网络模型输出图像数据的二维信息;二维信息包括用于表示图像数据中的至少一个第二目标的至少一个平面框的信息,平面框的信息包括用于表示平面框的位置的坐标;平面框用于框定第二目标;根据图像数据的深度信息以及图像数据的二维信息确定第二三维信息;第二三维信息包括用于表示图像数据中的至少一个第二目标的至少一个第二立体框的信息,第二立体框的信息包括用于表示第二立体框的位置的第二坐标,第二立体框用于框定第二目标;融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果;其中,在融合第一三维信息和第二三维信息中的相同目标的 过程中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。这样,可以结合第一三维信息和第二三维信息各自的优势,得到较为准确的目标检测结果,且该目标检测结果是三维结果,能更全面的反映目标的特征。
在一种可能的实现方式中,融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果,包括:利用第三神经网络模型融合第一三维信息和第二三维信息,得到目标检测结果;其中,第三神经网络模型的损失函数中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。这样,就能够基于第一三维信息和第二三维信息中各自的优势得到较为准确的目标检测结果。
在一种可能的实现方式中,第三神经网络模型的损失函数与下述一项或多项相关:第一神经网络模型的置信度、第二神经网络模型的置信度、第一神经网络模型的输出结果与第一神经网络模型的真实样本的交并比、第二神经网络模型的输出结果与第二神经网络模型的真实样本的交并比、第一神经网络模型中数据的归一化数值或第二神经网络模型中数据的归一化数值。这样,就能够获得更为有效的第三神经网络模型。
在一种可能的实现方式中,第一三维信息包括(X 1Y 1Z 1,W 1H 1L 1),X 1Y 1Z 1为第一坐标,W 1H 1L 1表示第一立体框的长宽高;第二三维信息包括(X 2Y 2Z 2,W 2H 2L 2),X 2Y 2Z 2为第二坐标,W 2H 2L 2表示第二立体框的长宽高;损失函数loss满足下述公式:
loss=f((αX 2+(1-α)X 1),(αY 2+(1-α)Y 1),((1-β)Z 2+βZ 1),(αW 2+(1-α)W 1),(αH 2+(1-α)H 1),((1-β)L 2+βL 1)),其中0.5<α<1,0.5<β<1。这样,可以基于该损失函数,体现第一三维信息和第二三维信息的权重关系,以根据该损失函数得到较为准确的目标检测结果。
在一种可能的实现方式中,利用图像数据得到点云数据包括:对图像数据进行三维重建,得到点云数据。这样,就能够利用三维创建便捷的根据图像数据得到图像对应的点云数据。
在一种可能的实现方式中,图像数据是在自动驾驶过程中拍摄的,根据图像数据的深度信息以及图像数据的二维信息确定第二三维信息,包括:获取自动驾驶过程中拍摄得到的图像数据的相邻图像数据;利用图像数据和图像数据的相邻图像数据计算图像数据的深度信息;融合图像数据的深度信息和图像数据的二维信息,得到第二三维信息。这样,就能够通过图像和图像的相邻信息便捷的得到图像对应的深度信息。
在一种可能的实现方式中,还包括:根据目标检测结果更新高精地图中的地标元素。这样,就能够获得更为实时和准确的高精地图。
在一种可能的实现方式中,根据目标检测结果更新高精地图中的地标元素,包括:确定目标检测结果中用于表示地标的地标检测结果;确定地标检测结果在高精地图中的位置;根据地标检测结果在高精地图中的位置,在高精地图中添加地标。这样,就能够获得更为实时和准确的高精地图。
在一种可能的实现方式中,还包括:根据目标检测结果确定自动驾驶策略。这样, 就能够利用目标检测结果更为准确的指导车辆的自动驾驶。
第二方面,本申请实施例提供一种目标检测装置。
该目标检测装置可为具有目标检测功能的车辆,或者为具有目标检测功能的其他部件。该目标检测装置包括但不限于:车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元或车载摄像头等其他传感器,车辆可通过该车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元或摄像头,实施本申请提供的方法。
该目标检测装置可以为智能终端,或设置在除了车辆之外的其他具有目标检测功能的智能终端中,或设置于该智能终端的部件中。该智能终端可以为智能运输设备、智能家居设备、机器人等其他终端设备。该目标检测装置包括但不限于智能终端或智能终端内的控制器、芯片或摄像头等其他传感器、以及其他部件等。
该目标检测装置可以是一个通用设备或者是一个专用设备。在具体实现中,该装置还可以台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或其他具有处理功能的设备。本申请实施例不限定该目标检测装置的类型。
该目标检测装置还可以是具有处理功能的芯片或处理器,该目标检测装置可以包括至少一个处理器。处理器可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。该具有处理功能的芯片或处理器可以设置在传感器中,也可以不设置在传感器中,而设置在传感器输出信号的接收端。处理器包括但不限于中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微控制单元(micro control unit,MCU)、微处理器(micro processor unit,MPU)、协处理器中的至少一个。
该目标检测装置还可以是终端设备,也可以是终端设备内的芯片或者芯片系统。该目标检测装置可以包括处理单元和通信单元。当该目标检测装置是终端设备时,该处理单元可以是处理器。该目标检测装置还可以包括存储单元,该存储单元可以是存储器。该存储单元用于存储指令,该处理单元执行该存储单元所存储的指令,以使该终端设备实现第一方面或第一方面的任意一种可能的实现方式中描述的一种目标检测方法。当该目标检测装置是终端设备内的芯片或者芯片系统时,该处理单元可以是处理器。该处理单元执行存储单元所存储的指令,以使该终端设备实现第一方面或第一方面的任意一种可能的实现方式中描述的一种目标检测方法。该存储单元可以是该芯片内的存储单元(例如,寄存器、缓存等),也可以是该终端设备内的位于该芯片外部的存储单元(例如,只读存储器、随机存取存储器等)。
示例性的,处理单元,用于利用图像数据得到点云数据;处理单元,还用于利用第一神经网络模型输出点云数据的第一三维信息;第一三维信息包括用于表示图像数据中的至少一个第一目标的至少一个第一立体框的信息,第一立体框的信息包括用于表示第一立体框的位置的第一坐标,第一立体框用于框定第一目标;处理单元,还用于利用第二神经网络模型输出图像数据的二维信息;二维信息包括用于表示图像数据中的至少一个第二目标的至少一个平面框的信息,平面框的信息包括用于表示平面框的位置的坐标;平面框用于框定第二目标;处理单元,还用于根据图像数据的深度信息 以及图像数据的二维信息确定第二三维信息;第二三维信息包括用于表示图像数据中的至少一个第二目标的至少一个第二立体框的信息,第二立体框的信息包括用于表示第二立体框的位置的第二坐标,第二立体框用于框定第二目标;处理单元,还用于融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果;其中,在融合第一三维信息和第二三维信息中的相同目标的过程中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
在一种可能的实现方式中,处理单元,具体用于利用第三神经网络模型融合第一三维信息和第二三维信息,得到目标检测结果;其中,第三神经网络模型的损失函数中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
在一种可能的实现方式中,第三神经网络模型的损失函数与下述一项或多项相关:第一神经网络模型的置信度、第二神经网络模型的置信度、第一神经网络模型的输出结果与第一神经网络模型的真实样本的交并比、第二神经网络模型的输出结果与第二神经网络模型的真实样本的交并比、第一神经网络模型中数据的归一化数值或第二神经网络模型中数据的归一化数值。
在一种可能的实现方式中,第一三维信息包括(X 1Y 1Z 1,W 1H 1L 1),X 1Y 1Z 1为第一坐标,W 1H 1L 1表示第一立体框的长宽高;第二三维信息包括(X 2Y 2Z 2,W 2H 2L 2),X 2Y 2Z 2为第二坐标,W 2H 2L 2表示第二立体框的长宽高;损失函数loss满足下述公式:
loss=f((αX 2+(1-α)X 1),(αY 2+(1-α)Y 1),((1-β)Z 2+βZ 1),(αW 2+(1-α)W 1),(αH 2+(1-α)H 1),((1-β)L 2+βL 1)),其中0.5<α<1,0.5<β<1。
在一种可能的实现方式中,处理单元,具体用于对图像数据进行三维重建,得到点云数据。
在一种可能的实现方式中,处理单元,具体用于获取自动驾驶过程中拍摄得到的图像数据的相邻图像数据;利用图像数据和图像数据的相邻图像数据计算图像数据的深度信息;融合图像数据的深度信息和图像数据的二维信息,得到第二三维信息。
在一种可能的实现方式中,处理单元,还用于根据目标检测结果更新高精地图中的地标元素。
在一种可能的实现方式中,处理单元,具体用于确定目标检测结果中用于表示地标的地标检测结果;确定地标检测结果在高精地图中的位置;根据地标检测结果在高精地图中的位置,在高精地图中添加地标。
在一种可能的实现方式中,处理单元,还用于根据目标检测结果确定自动驾驶策略。
第三方面,本申请实施例还提供一种传感器系统,用于为车辆提供目标检测功能。其包含至少一个本申请上述实施例提到的目标检测装置,以及,摄像头等其他传感器,该系统内的至少一个传感器装置可以集成为一个整机或设备,或者该系统内的至少一个传感器装置也可以独立设置为元件或装置。
第四方面,本申请实施例还提供一种系统,应用于无人驾驶或智能驾驶中,其包含至少一个本申请上述实施例提到的目标检测装置和摄像头等传感器其他传感器中的至少一个,该系统内的至少一个装置可以集成为一个整机或设备,或者该系统内的至少一个装置也可以独立设置为元件或装置。
进一步,上述任一系统可以与车辆的中央控制器进行交互,为车辆驾驶的决策或控制提供探测和/或融合信息。
第五方面,本申请实施例还提供一种终端,终端包括至少一个本申请上述实施例提到的目标检测装置或上述任一系统。进一步,终端可以为智能家居设备、智能制造设备、智能工业设备、智能运输设备(含无人机、车辆等)等。
第六方面,本申请实施例还提供一种芯片,包括至少一个处理器和接口;接口用于为至少一个处理器提供程序指令或者数据;至少一个处理器用于执行程序行指令,以实现第一方面或第一方面可能的实现方式中任一方法。
第七方面,本申请实施例提供一种目标检测装置,包括,至少一个处理器,用于调用存储器中的程序,以实现第一方面或第一方面任意可能的实现方式中的任一方法。
第八方面,本申请实施例提供一种目标检测装置,包括:至少一个处理器和接口电路,接口电路用于为至少一个处理器提供信息输入和/或信息输出;至少一个处理器用于运行代码指令,以实现第一方面或第一方面任意可能的实现方式中的任一方法。
第九方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有指令,当指令被执行时,以实现第一方面或第一方面任意可能的实现方式中的任一方法。
应当理解的是,本申请的第二方面至第九方面与本申请的第一方面的技术方案相对应,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
图1为本申请实施例提供的一种车辆100的功能框图;
图2为本申请实施例提供的一种计算机系统112的结构示意图;
图3为本申请实施例提供的一种芯片硬件结构的示意图;
图4为本申请实施例提供的一种自动驾驶的场景示意图;
图5为本申请实施例提供的一种利用独立的计算设备进行目标检测的场景示意图;
图6为本申请实施例提供的一种目标检测方法的流程示意图;
图7为本申请实施例提供的一种获得目标检测结果的流程示意图;
图8为本申请实施例提供的一种目标检测装置的结构示意图;
图9为本申请实施例提供的一种芯片的结构示意图;
图10为本申请实施例提供的另一种目标检测装置的结构示意图;
图11为本申请实施例提供的一种车辆的结构示意图。
具体实施方式
为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第 一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。例如,第一值和第二值仅仅是为了区分不同的值,并不对其先后顺序进行限定。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
电子设备执行目标检测(object detection),可以找出图像数据或点云数据中所有感兴趣的目标。例如目标检测可以包含目标定位和目标分类两个子任务,基于该两个子任务可以确定目标的类别和位置。
在电子设备进行目标检测的可能实现方式中,可以包括基于相机实现的目标检测,以及基于雷达实现的目标检测。
例如,基于相机的目标检测中,可以利用相机拍摄得到图像数据,进而电子设备可以识别图像数据中的目标。由于图像数据能够较为准确的反映物体的尺寸,识别图像数据得到的目标的平面信息是相对准确的,因此,通常的实现中,利用相机实现目标检测时,识别图像数据通常可以得到一个或多个平面框的信息,每个平面矩形框能够框定识别的一个目标。
例如,基于雷达的目标检测中,可以利用雷达获取三维点云数据,电子设备识别三维点云数据中的目标。由于雷达能够得到精准的速度位置信息,拥有较长的视野,识别的目标的深度信息是相对准确的,因此通常的实现中,利用雷达实现目标检测时,识别点云信息,通常可以得到一个或多个立体框的信息,每个立体框能够框定识别的一个目标,立体框能够反映目标的深度信息。
然而,在基于相机的目标检测中,因为通常输出的目标检测结果没有深度信息,在利用基于相机的目标检测结果进行避障或路径规划时,不能全面考量目标的立体形态,因此可能不能得到较好的策略。在基于雷达的目标检测中,当雷达位于杂波或地面起伏等环境中,雷达会受到杂波或地面起伏等的影响,导致点云绘制的平面信息不太准确,因此基于雷达的目标检测识别出的目标的效果较差。
综上所述,基于相机的目标检测中图像数据的深度信息的缺失,或者,基于雷达的目标检测中点云数据的平面信息的不准确性,导致基于相机或雷达得到的目标检测结果不准确。
因此,本申请实施例提供一种目标检测的方法,本申请提供一种目标检测方法和 装置,可以利用图像数据得到点云数据,利用第一神经网络模型输出点云数据的第一三维信息时,第一三维信息中表示深度的信息较为准确,利用第二神经网络模型输出图像数据的二维信息时,该二维信息中表示平面的信息较为准确,根据图像数据的深度信息将二维信息转换为第二三维信息后,该第二三维信息中表示平面的信息也较为准确;在融合第一三维信息和第二三维信息中的相同目标时,让第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重,则可以结合第一三维信息和第二三维信息各自的优势,得到较为准确的目标检测结果,且该目标检测结果是三维结果,能更全面的反映目标的特征。
为了更好的理解本申请实施例的方法,下面首先对本申请实施例适用的应用场景进行描述。
可能的实现方式中,本申请实施例的目标检测方法可以应用于自动驾驶、安防或监控等场景。例如,在自动驾驶场景,可以基于本申请实施例的目标检测方法,实现对障碍物等目标的检测,进而基于目标检测结果制定自动驾驶策略等。例如,在安防或监控场景,可以基于本申请实施例的目标检测方法,实现对人物等目标的检测,进而基于目标检测对非法入侵等不安全因素进行告警等。
示例性的,在自动驾驶场景中,本申请实施例的目标检测方法可以应用于车辆,或车辆中的芯片等。
示例性的,图1示出了本申请实施例提供的一种车辆100的功能框图。在一个实施例中,将车辆100配置为完全或部分地自动驾驶模式。例如,当车辆100配置为部分地自动驾驶模式时,车辆100在处于自动驾驶模式时还可通过人为操作来确定车辆及其周边环境的当前状态,例如确定周边环境中的至少一个其他车辆的可能行为,并确定该其他车辆执行可能行为的可能性相对应的置信水平,基于所确定的信息来控制车辆100。例如,在车辆100处于完全地自动驾驶模式中时,可以将车辆100置为不需要与人交互,自动执行驾驶相关操作。
车辆100可包括各种子系统,例如行进系统102、传感器系统104、控制系统106、一个或多个外围设备108以及电源110、计算机系统112和用户接口116。可选地,车辆100可包括更多或更少的子系统,并且每个子系统可包括多个元件。另外,车辆100的每个子系统和元件可以通过有线或者无线互连。
传感器系统104可包括感测关于车辆100周边的环境的信息的若干个传感器。例如,传感器系统104可包括定位系统122(定位系统可以是GPS系统,也可以是北斗系统或者其他定位系统)、惯性测量单元(inertial measurement unit,IMU)124、雷达126、激光测距仪128以及相机130。传感器系统104还可包括被监视车辆100的内部系统的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是自主车辆100的安全操作的关键功能。
定位系统122可用于估计车辆100的地理位置。IMU 124用于基于惯性加速度来感测车辆100的位置和朝向变化。在一个实施例中,IMU 124可以是加速度计和陀螺 仪的组合。
雷达126可利用无线电信号来感测车辆100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达126还可用于感测物体的速度和/或前进方向。
激光测距仪128可利用激光来感测车辆100所位于的环境中的物体。在一些实施例中,激光测距仪128可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他系统组件。
相机130可用于捕捉车辆100的周边环境的多个图像。相机130可以是静态相机或视频相机。
控制系统106为控制车辆100及其组件的操作。控制系统106可包括各种元件,其中包括转向系统132、油门134、制动单元136、传感器融合算法138、计算机视觉系统140、路线控制系统142以及障碍物避免系统144。
计算机视觉系统140可以操作来处理和分析由相机130捕捉的图像以便识别车辆100周边环境中的物体和/或特征。物体和/或特征可包括交通信号、道路边界和障碍物。计算机视觉系统140可使用物体识别算法、运动中恢复结构(structure from motion,SFM)算法、视频跟踪和其他计算机视觉技术。在一些实施例中,计算机视觉系统140可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。
路线控制系统142用于确定车辆100的行驶路线。在一些实施例中,路线控制系统142可结合来自传感器138、全球定位系统(global positioning system,GPS)122和一个或多个预定地图的数据以为车辆100确定行驶路线。
障碍物规避系统144用于识别、评估和避开或者以其他方式越过车辆100的环境中的潜在障碍物。
当然,在一个实例中,控制系统106可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。
车辆100通过外围设备108与外部传感器、其他车辆、其他计算机系统或用户之间进行交互。外围设备108可包括无线通信系统146、车载电脑148、麦克风150和/或扬声器152。
在一些实施例中,外围设备108提供车辆100的用户与用户接口116交互的手段。例如,车载电脑148可向车辆100的用户提供信息。用户接口116还可操作车载电脑148来接收用户的输入。车载电脑148可以通过触摸屏进行操作。在其他情况中,外围设备108可提供用于车辆100与位于车内的其它设备通信的手段。例如,麦克风150可从车辆100的用户接收音频(例如,语音命令或其他音频输入)。类似地,扬声器152可向车辆100的用户输出音频。
可能的实现方式中,在车载电脑148的显示屏中,还可以显示根据本申请实施例的目标检测算法跟踪得到的目标,使得用户可以在显示屏中感知车辆周围的环境。
无线通信系统146可以直接地或者经由通信网络来与一个或多个设备无线通信。
车辆100的部分或所有功能受计算机系统112控制。计算机系统112可包括至少一个处理器113,处理器113执行存储在例如数据存储装置114这样的非暂态计算机可读介质中的指令115。计算机系统112还可以是采用分布式方式控制车辆100的个体组件或子系统的多个计算设备。
处理器113可以是任何常规的处理器,诸如商业可获得的中央处理器(central processing unit,CPU)。替选地,该处理器可以是诸如用于供专门应用的集成电路(application specific integrated circuit,ASIC)或其它基于硬件的处理器的专用设备。尽管图1功能性地图示了处理器、存储器、和在相同块中的计算机系统112的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机、或存储器实际上可以包括可以或者可以不存储在相同的物理外壳内的多个处理器、计算机、或存储器。
在此处所描述的各个方面中,处理器可以位于远离该车辆并且与该车辆进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于车辆内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。
在一些实施例中,数据存储装置114可包含指令115(例如,程序逻辑),指令115可被处理器113执行来执行车辆100的各种功能,包括以上描述的那些功能。数据存储装置114也可包含额外的指令,包括向推进系统102、传感器系统104、控制系统106和外围设备108中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。
除了指令115以外,数据存储装置114还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100在自主、半自主和/或手动模式中操作期间被车辆100和计算机系统112使用。
用户接口116,用于向车辆100的用户提供信息或从其接收信息。可选地,用户接口116可包括在外围设备108的集合内的一个或多个输入/输出设备,例如无线通信系统146、车载电脑148、麦克风150和扬声器152。
计算机系统112可基于从各种子系统(例如,行进系统102、传感器系统104和控制系统106)以及从用户接口116接收的输入来控制车辆100的功能。例如,计算机系统112可利用来自控制系统106的输入以便控制转向单元132来避免由传感器系统104和障碍物避免系统144检测到的障碍物。在一些实施例中,计算机系统112可操作来对车辆100及其子系统的许多方面提供控制。
可选地,上述这些组件中的一个或多个可与车辆100分开安装或关联。例如,数据存储装置114可以部分或完全地与车辆100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图1不应理解为对本申请实施例的限制。
在道路行进的自动驾驶汽车,如上面的车辆100,可以根据本申请实施例的目标检测方法跟踪其周围环境内的物体以确定自身对当前速度或行驶路线的调整等。该物体可以是其它车辆、交通控制设备、或者其它类型的物体。
除了提供调整自动驾驶汽车的速度或行驶路线的指令之外,计算设备还可以提供修改车辆100的转向角的指令,以使得自动驾驶汽车遵循给定的轨迹和/或维持与自动驾驶汽车附近的障碍物(例如,道路上的相邻车道中的车辆)的安全横向和纵向距离。
上述车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车、和手推车等,本申请实施例不做特别的限定。
示例性的,图2为图1中的计算机系统112的结构示意图。
如图2所示,计算机系统112包括处理器113,处理器113和系统总线105耦合。处理器113可以是一个或者多个处理器,其中每个处理器都可以包括一个或多个处理器核。显示适配器(video adapter)107,显示适配器107可以驱动显示器109,显示器109和系统总线105耦合。系统总线105通过总线桥111和输入输出(I/O)总线耦合。I/O接口115和I/O总线耦合。I/O接口115和多种I/O设备进行通信,比如输入设备117(如:键盘,鼠标,触摸屏等),多媒体盘(media tray)121,(例如,CD-ROM,多媒体接口等)。收发器123(可以发送和/或接受无线电通信信号),摄像头155(可以捕捉静态和动态数字视频图像)和外部USB接口125。其中,可选地,和I/O接口115相连接的接口可以是通用串行总线(universal serial bus,USB)接口。
其中,处理器113可以是任何传统处理器,包括精简指令集计算(“RISC”)处理器、复杂指令集计算(“CISC”)处理器或上述的组合。可选地,处理器可以是诸如专用集成电路(“ASIC”)的专用装置。可选地,处理器113可以是神经网络处理器或者是神经网络处理器和上述传统处理器的组合。
可选地,在本文所述的各种实施例中,计算机系统可位于远离自动驾驶车辆的地方,并且可与自动驾驶车辆无线通信。在其它方面,本文所述的一些过程在设置在自动驾驶车辆内的处理器上执行,其它由远程处理器执行,包括采取执行单个操纵所需的动作。
计算机系统112可以通过网络接口129和软件部署服务器149通信。网络接口129是硬件网络接口,比如,网卡。网络127可以是外部网络,比如因特网,也可以是内部网络,比如以太网或者虚拟私人网络(VPN)。可选地,网络127还可以是无线网络,比如WiFi网络,蜂窝网络等。
硬盘驱动接口131和系统总线105耦合。硬盘驱动接口131和硬盘驱动器133相连接。系统内存135和系统总线105耦合。运行在系统内存135的软件可以包括计算机系统112的操作系统(operating system,OS)137和应用程序143。
操作系统包括壳(shell)139和内核(kernel)141。shell 139是介于使用者和操作系统之内核(kernel)间的一个接口。shell是操作系统最外面的一层。shell管理使用者与操作系统之间的交互:等待使用者的输入,向操作系统解释使用者的输入,并且处理各种各样的操作系统的输出结果。
内核141由操作系统中用于管理存储器、文件、外设和系统资源的那些部分组成。直接与硬件交互,操作系统的内核141通常运行进程,并提供进程间的通信,提供CPU时间片管理、中断、内存管理、IO管理等等。
应用程序141包括控制汽车自动驾驶相关的程序,比如,管理自动驾驶的汽车和路上障碍物交互的程序,控制自动驾驶汽车路线或者速度的程序,控制自动驾驶汽车和路上其他自动驾驶汽车交互的程序。应用程序141也存在于软件部署服务器(deploying server)149的系统上。在一个实施例中,在需要执行应用程序141时,计算机系统可以从deploying server149下载应用程序143。
传感器153和计算机系统关联。传感器153用于探测计算机系统112周围的环境。举例来说,传感器153可以探测动物,汽车,障碍物和人行横道等,进一步传感器还 可以探测上述动物,汽车,障碍物和人行横道等物体周围的环境,比如:动物周围的环境,例如,动物周围出现的其他动物,天气条件,周围环境的光亮度等。可选地,如果计算机系统112位于自动驾驶的汽车上,传感器可以是摄像头,红外线感应器,化学检测器,麦克风等。
示例性的,图3为本申请实施例提供的一种芯片硬件结构的示意图。
如图3所示,该芯片可以包括神经网络处理器30。该芯片可以应用于图1所示的车辆中,或图2所示的计算机系统中。
神经网络处理器30可以是神经网络处理器(neural network processing unit,NPU),张量处理器(tensor processing unit,TPU),或者图形处理器(graphics processing unit,GPU)等一切适合用于大规模异或运算处理的处理器。
在一些实现中,运算电路303内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路303是二维脉动阵列。运算电路303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路303是通用的矩阵处理器。
统一存储器306用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)305,被搬运到权重存储器302中。输入数据也通过DMAC被搬运到统一存储器306中。
总线接口单元(bus interface unit,BIU)310,用于DMAC和取指存储器(instruction fetch buffer)309的交互;总线接口单元301还用于取指存储器309从外部存储器获取指令;总线接口单元301还用于存储单元访问控制器305从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
控制器304连接的取指存储器(instruction fetch buffer)309,用于存储控制器304使用的指令;
统一存储器306,输入存储器301,权重存储器302以及取指存储器309均为On-Chip存储器。外部存储器独立于该NPU硬件架构。
示例性的,图4为本申请实施例提供的一种自动驾驶的场景示意图。
如图4所示,在自动驾驶场景中,自动驾驶车辆401可以根据其车辆的传感器,如相机、雷达等探测周围的物体。例如,自动驾驶车辆401可以识到其车辆周围的车辆4022、车辆4032、和限速标志4042等其他物体。
在自动驾驶场景中,自动驾驶车辆401可以利用相机拍摄周边物体,自动驾驶车辆中的处理器就可以利用本申请实施例中提供的目标检测方法,识别自动驾驶场景中的物体的目标,得到一个或多个目标对应的一个或多个立体框,每个立体框能够框定自动驾驶车辆401所识别的目标。示例性的,自动驾驶车辆401可以通过立体框4021框定车辆4022,通过立体框4031框定车辆4032,通过立体框4041框定限速标识4042。后续自动驾驶车辆401可以根据识别到的目标和车道数据等其他自动驾驶数据,规划自动驾驶路线,进而保证自动驾驶车辆401的正常驾驶。
示例性的,图5为本申请实施例提供的一种利用独立的计算设备进行目标检测的场景示意图。
示例性的,如图5所示,以独立的计算设备为服务器为例,该场景中可以包含:自 动驾驶车辆501、无线广域网(wide area network,WAN)502、通信网络503和服务器504。
其中,自动驾驶车辆501中可以包含一个或多个相机,或者无线收发器等设备。自动驾驶车辆的无线收发器,能够与该场景中的无线WAN502交换数据并根据需要进行通信。示例性的,自动驾驶车辆501中的自动驾驶系统可以使用无线WAN502经由一个或多个通信网络503(如因特网),将自动驾驶车辆中相机拍摄得到的图像数据,或其他传感器接收到的其他数据传输到服务器504中进行处理。服务器504再将处理后的数据传输到自动驾驶车辆501的自动驾驶系统中,用于指导车辆的自动驾驶。其中,服务器504可以为一个或多个服务器。
下面对本申请实施例中所描述的词汇进行说明。可以理解,该说明是为更加清楚的解释本申请实施例,并不必然构成对本申请实施例的限定。
本申请实施例所描述的相机(camera,或者也可以称为摄像头)可以将物体通过镜头生成的光学图像投射到图像传感器表面上,然后转为电信号,经过数模转换后变为数字图像信号,数字图像信号可以在数字信号处理(digital signal processing,DSP)芯片中加工处理。示例性的,相机可以包括单目相机和双目相机等。
本申请实施例所描述的点云数据可以为:表示一个三维坐标系统中的一组向量的集合。这些向量通常以三维坐标(x维度,y维度和z维度)的形式表示,点云数据主要用来表示目标物体的外表面特性,点云数据中的每一个点中都包含有三维坐标。可以基于较多的点云数据得到目标的尺寸和目标的深度信息,进而结合该点云数据对应的图像数据的平面信息,得到较为准确的目标检测结果。
本申请实施例所描述的深度信息可以包括:表示场景中各点到相机平面的距离,能够反应场景中可见表面的几何形状。利用物体的深度信息和平面信息可以得到该物体的三维表示。
本申请实施例所描述的神经网络可以是:模仿生物神经网络的结构和功能的数学模型或计算模型,用于对函数进行估计或近似。神经网络模型需要利用大量样本进行训练,训练好模型后,就可以利用该神经网络模型进行预测。
本申请实施例所描述的三维重建(3D Reconstruction)可以包括:根据单视图或者多视图的图像数据重建三维信息的过程,三维重建技术可以把真实场景或物体刻画成适合计算机表示和处理的数学模型,以便在计算机环境下对其进行处理、操作和分析。三维重建技术的重点在于获取目标场景或物体的深度信息。在景物深度信息已知的条件下,只需要经过点云数据的配准和融合,就可以实现景物的三维重建。可能的实现方式中,基于深度相机的三维扫描和重建技术,重建出的三维模型通常较完整且真实度较高,因而得到了广泛的应用。
本申请实施例所描述的高精地图(high definition map,HD Map)可以是:面向机器的供自动驾驶汽车使用的地图数据。其可以更加精准的描绘道路交通信息元素,更加真实的反映出道路的实际情况。高精地图能够实现高精度的定位位置功能、道路级和车道级的规划能力、以及车道级的引导能力等能力。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以独立实现,也可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
图6为本申请实施例提供的一种目标检测方法的流程示意图,如图6所示,该方法包括:
S601、利用图像数据得到点云数据。
本申请实施例中,图像数据可以是用于执行本申请实施例的方法的设备或者装置,例如从相机(或摄像头)中获取的。
示例性的,相机可以周期性的拍摄得到图像数据,并将图像数据发送给用于执行本申请实施例的方法的设备或者装置,该设备或者装置可以利用图像数据得到该图像数据对应的点云数据。
示例性的,用于执行本申请实施例的方法的设备可以向相机发送拍摄图像的指令,相机可以在接收到拍摄图像的指令时,拍摄得到图像数据,并将图像数据发送给该设备,该设备可以利用图像数据得到图像数据对应的点云数据。
本申请实施例中,图像数据的数量可以是一个或者多个。利用图像数据得到点云数据的方法可以根据实际应用场景适应设置。其中,图像数据为二维的平面信息,点云数据为三维的立体信息。
示例性的,当图像数据的数量为单个时,利用图像数据得到点云数据的实现可以为,对图像数据进行明暗恢复形状(shape from shading,SFS)的处理,获取单个图像数据中物体表面的明暗关系;根据该明暗关系得到物体表面各像素点的相对高度信息或其他参数信息;利用图像数据中物体的参数信息和图像数据中物体的平面信息;恢复出单个图像数据的三维立体信息;得到该图像数据对应的点云数据。
示例性的,当图像数据的数量为多个时,利用图像数据得到点云数据的实现可以为:获取各图像数据中的特征区域(或指各图像数据中的某一物体对应的区域);根据提取的特征区域建立图像数据对之间的对应关系;利用图像数据对之间的对应关系和相机的参数信息计算图像数据所对应的三维立体信息;得到图像数据对应的点云数据。
可以理解的是,利用图像数据得到点云数据的实现还可以根据实际场景包括其他内容,本申请实施例中对利用图像数据得到点云数据的方式不做限定。
S602、利用第一神经网络模型输出点云数据的第一三维信息。
本申请实施例中,该第一三维信息包括用于表示图像数据中的至少一个第一目标的至少一个第一立体框的信息,第一立体框的信息包括用于表示第一立体框的位置的第一坐标;第一立体框用于框定第一坐标。其中,图像数据中的第一目标的数量可以为一个或多个,每个第一目标可以对应一个第一立体框。
示例性的,第一神经网络模型是根据点云样本信息训练得到的,利用点云样本数据训练第一神经网络模型的一种可能实现为:在待训练的神经网络模型中输入点云样本数据,利用待训练的神经网络模型输出预测三维信息,利用损失函数比较预测三维信息与真实三维信息的差距,当该模型输出的预测三维信息与真实三维信息的差距不满足损失函数,则调整该模型参数,继续训练;直到模型输出的预测三维信息与真实三维信息的差距满足损失函数,则模型训练结束,得到能够识别点云数据的第一神经网络模型。后续可以将点云数据输入到第一神经网络模型中,能够输出框定第一目标的第一立体框的信息,该第一立体框的信息包括用于表示该第一立体框的位置的第一 坐标。
其中,该点云样本数据可以是对一定数量的点云进行标注得到的。利用第一神经网络模型识别出的第一三维信息中的第一目标的数量与第一神经网络模型的置信度相关。例如,第一神经网络模型的置信度越高,利用第一神经网络模型可以输出的第一三维信息中的第一目标的数量越多,并且识别出的第一目标的准确度越高。
可以理解的是,本申请实施例中对获取点云数据的方式不做限定。该预测三维信息可以为,利用待训练的神经网络模型,输出能够框定点云目标的预测的立体框的信息,该预测的立体框的信息包括用于表示该预测的立体框的位置的预测的三维坐标。
S603、利用第二神经网络模型输出图像数据的二维信息。
本申请实施例中,该二维信息包括用于表示图像数据中的至少一个第二目标的至少一个平面框的信息,该平面框的信息包括用于表示平面框的位置的坐标;该平面框用于框定第二目标。其中,图像数据中的第二目标的数量可以为一个或多个,每个第二目标可以对应一个平面框。
示例性的,第二神经网络模型是根据图像样本数据训练得到的,根据图像样本数据训练第二神经网络模型的一种可能实现为:在待训练的神经网络模型中输入图像样本数据,利用待训练的神经网络模型输出预测二维信息,利用损失函数比较预测二维信息与真实二维信息的差距,当该模型输出的预测二维信息与真实二维信息的差距不满足损失函数,则调整该模型参数,继续训练;直到模型输出的预测二维信息与真实二维信息的差距满足损失函数,则模型训练结束,得到能够识别图像数据的第二神经网络模型。后续可以将图像数据输入到第二神经网络模型中,能够输出框定第二目标的平面框的信息,该平面框的信息包括用于表示该平面框的位置的二维坐标。
其中,该图像样本数据可以是对相机拍摄得到的图像数据进行标注得到的,也可以是对图像数据库中获取的图像数据进行标注得到的。利用第二神经网络模型识别出的二维信息中的第二目标的数量与第二神经网络模型的置信度相关。例如,第二神经网络模型的置信度越高,可以利用第二神经网络模型输出的第二维信息中的第二目标的数量越多,并且识别出的第二目标的准确度越高。
可以理解的是,获取图像样本数据也可以根据实际应用场景包括其他内容,本申请实施例中对其他获取图像样本数据的方式不做限定。该预测二维信息可以为利用待训练的神经网络模型,输出能够框定图像目标的预测的平面框的信息,该预测的平面框的信息包括用于表示该预测的平面框的位置的预测的二维坐标。
S604、根据图像数据的深度信息以及图像数据的二维信息确定第二三维信息。
本申请实施例中,该第二三维信息包括用于表示图像数据中的至少一个第二目标的至少一个第二立体框的信息,该第二立体框的信息包括用于表示第二立体框的位置的第二坐标,第二立体框用于框定第二目标。
示例性的,将图像数据的二维信息转化为三维信息可以是根据该图像数据的深度信息和该图像数据的二维信息确定该图像数据对应的第二三维信息。相机拍摄得到的图像数据,由于相机能够准确的获得物体的尺寸,该图像数据的二维信息中的平面信息是相对准确的;因此,将利用图像数据的深度信息以及图像数据的二维信息,确定得到的图像数据对应的三维信息,该三维信息中的平面信息也是相对准确的。
S605、融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果。
本申请实施例中,在融合第一三维信息和第二三维信息中的相同目标的过程中,第一三维信息中用于表示图像数据深度的信息(或称为深度信息)的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息(或称为平面信息)的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
本申请实施例中,相同目标表示第一三维信息中的第一目标与第二三维信息中的第二目标表示为同一个物体的目标。相同目标的数量可以为一个或多个,每个相同目标中包含一个第一目标和一个第二目标。
示例性的,可以利用第一三维信息中的第一目标与第二三维信息中的第二目标的交叠比例(或称交并比),确定第一三维信息中的第一目标和第二三维信息中的第二目标为相同目标。其中,交叠比例大于或等于设定的阈值。
第一三维信息中的第一目标与第二三维信息中的第二目标的交叠的部分越多(或理解为交叠比例越大),可以表明第一三维信息中的第一目标与第二三维信息中的第二目标指向的为相同目标。因此,可以在一三维信息中的第一目标与第二三维信息中的第二目标的交叠比大于或等于该阈值时,确定第一三维信息和第二三维信息中的相同目标。
在一种可能的实现方式中,在第一三维信息中的第一目标的数量为一个,第二三维信息中的第二目标的数量为一个时,可以在该第一目标与第二目标的交叠比例大于该阈值时,确定第一三维信息和第二三维信息中的相同目标。
在另一种可能的实现方式中,在第一三维信息中的第一目标的数量为多个,第二三维信息中的第二目标的数量为多个的情况下,可以将一个第一三维信息中的第一目标和一个第二三维信息中的第二目标进行组对,并分别计算每对第一目标和第二目标的交叠比例,将交叠比例大于或等于该阈值的每对第一目标和第二目标确定为相同目标,得到第一三维信息和第二三维信息中的相同目标。
可能的实现方式中,如果第一三维信息中的第一目标与第二三维信息中的第二目标的交叠比例小于该阈值时,则认为第一三维信息中的第一目标与第二三维信息中的第二目标对应的不是相同目标。
本申请实施例中,该目标检测结果可以为目标的三维信息,以及可选的用于表示目标的类型、位置、尺寸或速度等信息,目标检测结果中包含的第一三维信息和第二三维信息中的相同目标可以为一个或多个,本申请实施例对目标检测结果的具体内容和数量不做限定。在自动驾驶的场景中,该第一目标、第二目标或目标检测结果可以为识别出的车辆、行人、路标或障碍物等自动驾驶所需的数据。
示例性的,该融合第一三维信息和第二三维信息中的相同目标的过程可以为:对第一三维信息和第二三维信息中的各维度信息(包括x维度,y维度和z维度)赋予权重,利用两个三维信息中的各维度信息的加权,得到目标检测结果。其中,第一三维信息中用于表示图像数据深度的信息(z维度)的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息(x维度和y维度)的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
可以理解的是,该融合第一三维信息和第二三维信息的方法可以利用神经网络模型、数学模型、统计学方法或根据实际场景不同所利用的其他方法,本申请实施例中对融合第一三维信息和第二三维信息的方法不做限定。
本申请实施例中,第一三维信息是利用第一神经网络模型输出的第一目标的三维信息,该第一三维信息中的深度信息相对准确;第二三维信息是对二维信息进行转化后得到的三维信息,该第二三维信息中的平面信息相对准确。因此,在融合时,为了保证第一三维信息中深度信息的准确度,该第一三维信息中用于表示图像数据深度的信息的权重较大;为了保证第二三维信息中平面信息的准确度,该第二三维信息中用于表示图像数据平面的信息的权重较大,所以融合得到的目标检测结果能够同时结合两个三维信息的优势信息,可以得到较为准确的目标检测结果。
在图5对应的实施例的基础上,可能的实现方式中,S606包括:利用第三神经网络模型融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果;其中,第三神经网络模型的损失函数中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
本申请实施例中,该第三神经网络模型的一种可能实现为,在待训练的神经网络模型中输入两种样本数据,两种样本数据可以分别为:对第一神经网络模型输出的数据进行标注得到的样本数据,以及对第二神经网络模型输出的数据叠加深度信息生成的数据进行标注得到的样本数据;利用待训练的神经网络模型输出两种样本数据的预测的融合结果;利用损失函数比较预测的融合结果与真实结果的差距,当该模型输出的预测的融合结果与真实结果的差距不满足损失函数,则调整该模型参数,继续训练;直到模型输出的预测的融合结果与真实结果之间的差距满足损失函数,则模型训练结束;得到能够输出较为准确的目标检测结果的第三神经网络模型。后续可以将第一三维信息和第二三维信息中的相同目标输入到第三神经网络模型中,能够输出融合两个三维信息中的相同目标后的目标检测结果。
示例性的,图7示出了一种获得目标检测结果的流程示意图。
本申请实施例中,如图7所示,将第一神经网络模型输出的第一三维信息,以及第二神经网络模型输出的二维信息进行三维转化得到的第二三维信息,输入到第三神经网络模型中,利用第三神经网络模型可以输出目标检测结果。
示例性的,该第一三维信息包括(X 1Y 1Z 1,W 1H 1L 1),X 1Y 1Z 1为第一坐标,W 1H 1L 1表示第一立体框的长宽高;第二三维信息包括(X 2Y 2Z 2,W 2H 2L 2),X 2Y 2Z 2为第二坐标,W 2H 2L 2表示第二立体框的长宽高。该损失函数loss可以满足下述公式:
loss=f((αX 2+(1-α)X 1),(αY 2+(1-α)Y 1),((1-β)Z 2+βZ 1),(αW 2+(1-α)W 1),(αH 2+(1-α)H 1),((1-β)L 2+βL 1)),其中0.5<α<1,0.5<β<1。
其中,α为第一三维信息表示平面信息(X 1Y 1,W 1H 1)以及第二三维信息中表示平面信息(X 2Y 2,W 2H 2)的权重;β为第一三维信息表示深度信息(Z 1,L 1)和第二三维信息中表示深度信息(Z 2,L 2)的权重。
示例性的,在该公式中,当第二三维信息中α的取值处于(0.5,1)的范围时,则 对应的第一三维信息中的1-α的取值处于(0,0.5)的范围,此时表示第二三维信息中的平面信息(X 2Y 2,W 2H 2)所占的权重较高,进而说明了第二三维信息中的平面信息的准确度较高;当第一三维信息中β的取值处于(0.5,1)的范围时,则对应的第二三维信息中的1-β的取值处于(0,0.5)的范围,此时表示第一三维信息中的深度信息(Z 1,L 1)所占的权重较高,进而说明了第一三维信息中的深度信息的准确度较高。
基于此,利用神经网络模型融合第一三维信息和第二三维信息中的相同目标,就能够得到更加准确的目标检测结果。
在图6对应的实施例的基础上,可能的实现方式中,第三神经网络模型的损失函数与下述一项或多项相关:第一神经网络模型的置信度、第二神经网络模型的置信度、第一神经网络模型的输出结果与第一神经网络模型的真实样本的交并比、第二神经网络模型的输出结果与第二神经网络模型的真实样本的交并比、第一神经网络模型中数据的归一化数值或第二神经网络模型中数据的归一化数值。
本申请实施例中,第一神经网络模型的置信度可以为,利用第一神经网络模型输出预测的第一三维信息的准确率;第二神经网络模型的置信度可以为,利用第二神经网络模型输出预测的二维信息的准确率。例如,利用神经网络模型识别车辆,则置信度表示利用该神经网络模型识别出物体的分类为车辆的准确率。
示例性的,当第三神经网络模型的损失函数,与第一神经网络模型的置信度C i以及第二神经网络模型的置信度C p相关时,该损失函数loss满足下述公式:
loss=f((C iX 2+(1-C i)X 1),(C iY 2+(1-C i)Y 1),((1-C p)Z 2+C pZ 1),(C iW 2+(1-C i)W 1),(C iH 2+(1-C i)H 1),((1-C p)L 2+C pL 1)),其中0.5<C i<1,0.5<C p<1。
本申请实施例中,第一神经网络的输出结果与第一神经网络模型的真实样本的交并比可以理解为,利用第一神经网络模型可以输出的目标的预测立体框,真实样本中可以通过真实立体框框定样本中的目标,该交并比表示该预测立体框与该真实立体框的交叠比例;第二神经网络的输出结果与第二神经网络模型的真实样本的交并比可以理解为,利用第二神经网络模型可以输出的目标的预测平面框,真实样本中可以通过真实平面框框定样本中的目标,该交并比表示该预测平面框与该真实平面框的的交叠比例。
示例性的,当第三神经网络模型的损失函数,与第一神经网络模型的输出结果与第一神经网络模型的真实样本的交并比IoU i;第二神经网络模型的输出结果与第二神经网络模型的真实样本的交并比IoU p相关时,该损失函数loss满足下述公式:
loss=f((IoU iX 2+(1-IoU i)X 1),(IoU iY 2+(1-IoU i)Y 1),((1-IoU p)Z 2+IoU pZ 1),(IoU iW 2+(1-IoU i)W 1),(IoU iH 2+(1-IoU i)H 1),((1-IoU p)L 2+IoU pL 1)),其中0.5<IoU i<1,0.5<IoU p<1。
本申请实施例中,第一神经网络模型中数据的归一化数值表示为,对第一神经网络模型中输入的点云数据进行归一化处理的数值,使点云数据可以映射到一个特定区间内;第二神经网络模型中数据的归一化数值表示为,对第二神经网络模型中输入的图像数据进行归一化处理的数值,使图像数据可以映射到一个特定区间内。
示例性的,当第三神经网络模型的损失函数,与一神经网络模型中数据的归一化数值E i,以及第二神经网络模型中数据的归一化数值E p相关时,该损失函数loss满 足下述公式:
loss=f((E iX 2+(1-E i)X 1),(E iY 2+(1-E i)Y 1),((1-E p)Z 2+E pZ 1),(E iW 2+(1-E i)W 1),(E iH 2+(1-E i)H 1),((1-E p)L 2+E pL 1)),其中0.5<E i<1,0.5<E p<1。
基于此,就可以利用损失函数和第三神经网络模型,得到更为准确的目标检测结果。
在图6对应的实施例的基础上,可能的实现方式中,S601包括:对图像数据进行三维重建,得到点云数据。
示例性的,该三维重建的步骤可以包括但不限于如下步骤:
S6011、利用相机采集图像数据。
本申请实施例中,相机可以为双目相机,双目相机可以采集得到不同角度的图像对数据,通过总线接口传输到自动驾驶车辆的处理设备中进行处理。其中,可以利用自动驾驶车辆的处理设备(如处理器)对图像数据进行处理,也可以利用服务器上独立的计算设备对相机拍摄得到的图像数据进行处理。
S6012、标定相机,获得相机的参数信息。
示例性的,对自动驾驶车辆中的双目相机进行标定时,可以采集多幅标定板图像数据,通过阈值分割找到标定板的内部区域;通过亚像素边缘提取方法得到标定板各个圆点的边缘,通过最小二乘圆拟合获取圆点的圆心坐标,确定圆心坐标与它们在图像数据中投影之间的对应关系以及标定板与相机之间大致的位置关系,为相机的外参初始值;调用Halcon库函数确定两个相机的内参数、外参数以及两个相机之间的相对位置关系;通过多次测量取平均值确定双目相机的相关参数信息。
S6013、对图像数据进行立体矫正。
示例性的,将从双目相机获取的图像数据校正为标准极线几何结构,并得到校正后的双目相机的相机参数。
S6014、图像数据预处理。
示例性的,预处理可以包括:把经过矫正后的彩色图像数据对通过加权平均法转化为灰度图像数据;进行直方图均衡化操作处理,使得图像数据的灰度分布趋向平均,图像数据所占有的像素灰度间距拉开,加大图像数据的反差,达到图像数据增强的目的;全局阈值分割,获取图像数据中感兴趣的区域。
S6015、立体匹配。
示例性的,通过极线约束及双目相机与目标物体的距离估计出的视差搜索空间,从而减少匹配的搜索范围;通过多重网格技术引进粗网格系列加速偏微分方程的收敛,提高匹配速度;通过细网格迭代,将残差从最细网格依次限制到粗糙的网格中,运用像素的灰度、梯度及平滑度相结合的相似度判断准则在粗网格搜索空间内寻找匹配点,得到视差值;将粗网格得到的视差值依次延拓到细网格,通过组合修正得到最终匹配点的视差值;按照以上步骤在整幅图像数据上进行遍历,直到得到完整连续的视差图。
S6016、点云重建。
示例性的,通过双目立体系统深度恢复原理,获取图像数据每个点的三维空间坐标,得到图像数据对应的点云数据;对点云数据进行基于移动最小二乘法的平滑滤波,获取平滑后的点云数据。
可以理解的是,对图像数据进行三维重建得到点云数据的方法也可以根据实际应用场景包括其他内容,本申请实施例中对其他可以对图像数据进行三维重建获取点云数据的方法不做限定。
基于此,就能够在自动驾驶的场景中,更加便捷的对自动驾驶车俩拍摄得到的图像数据进行处理,得到图像数据对应的点云数据。
在图6对应的实施例的基础上,可能的实现方式中,在自动驾驶过程中,可以利用单目相机拍摄一系列连续的图像数据,S604包括:获取自动驾驶过程中拍摄得到的图像数据的相邻图像数据;利用图像数据和图像数据的相邻图像数据计算图像数据的深度信息;融合图像数据的深度信息和图像数据的二维信息,得到第二三维信息。
本申请实施例中,单目相机可以是具有一个普通RGB摄像头的相机,其可以拍摄得到连续的图像数据。可以理解的是,该单目相机也可以为能够连续拍摄的拍摄设备,例如内置在车辆中的照相机等,本申请实施例中对单目相机的种类不做限制。
示例性的,可以利用单目相机拍摄得到的多个图像数据,以任一个图像数据和图像数据的相邻图像数据为例,计算图像数据的深度信息,过程可以包括:
根据图像数据和图像数据的相邻图像数据中的各图像数据的焦距和景深信息,获得像素点与单目相机镜头之间的距离范围和景深信息;对图像数据和图像数据的相邻图像数据进行对比匹配,对图像数据中显示为同一像素点建立映射关系;根据各图像数据的焦距、光圈、景深等信息,以及各图像数据的像素点与相机镜头之间的距离范围,通过平均值和其他方法计算出像素点与相机镜头间的准确距离;根据图像数据和图像数据的相邻图像数据进行对比匹配得到的清晰的像素点,以及像素点与相机镜头间的准确距离,从而获得图像数据中每个像素点的深度信息,后续融合图像数据的二维信息和图像数据的深度信息,可以得到第二三维信息。
其中,若图像数据和图像数据的相邻图像数据间的深度差值大于某个阈值时,可以通过差值及其他的转换处理算法进行深度信息的拟合。该景深信息表示,在拍摄图像数据的对焦过程中,通过镜头将在焦平面上清晰成像,所获得的包含图像数据的清晰的区间的信息。
该融合图像数据的深度信息和图像数据的二维信息可以为:对图像数据的二维信息中各像素点赋予该像素点对应的深度信息,得到第二三维信息。
基于此,就能够更加便捷的利用相机获取图像数据对应的三维信息。
在图6对应的实施例的基础上,可能的实现方式中,根据目标检测结果更新高精地图中的地标元素。
本申请实施例中,高精地图不仅可以描绘道路,还能够反映出每个道路中包含的车辆情况,能够更真实的反映驾驶过程中道路的实际样式。
示例性的,高精地图中的地标元素可以为标识地理位置的元素,该地标元素可以包括:每条车道和车道之间的车道线元素、道路标识牌元素、路灯元素、障碍物元素、绿化带元素等能够标识物体所在地理位置的地标元素。另外,高精地图中还可以包括更为丰富的道路信息元素,例如:道路形状以及每个车道的坡度、曲率、和侧倾的数据,每条车道和车道之间的车道线的样式,道路的限高情况,道路上的箭头和文字的内容等自动驾驶所需的道路信息元素。
利用本申请实施例中获得的目标检测结果,与高精地图中目标检测结果所在的位置相对照,可以更新高精地图中相应位置的地标元素。
基于此,将目标检测结果更新到高精地图中的地标元素后,就能够得到更为准确的高精地图,进而更加准确的指导车辆的自动驾驶。
在图6对应的实施例的基础上,可能的实现方式中,根据目标检测结果更新高精地图中的地标元素,包括:确定目标检测结果中用于表示地标的地标检测结果;确定地标检测结果在高精地图中的位置;根据地标检测结果在高精地图中的位置,在高精地图中添加地标。
本申请实施例中,确定目标检测结果中用于表示地标的地标检测结果可以为,在本申请实施例中获得的目标检测结果中,识别表示地标的地标检测结果。该识别地标检测结果的方式可以为,训练能够识别地标检测结果的神经网络模型,利用该神经网络模型识别目标检测结果中的物体的分类为地标的地标检测结果。该地标检测结果可以包括:该地标的位置信息和该地标的尺寸信息等其他类型的地标信息。
本申请实施例中,确定地标检测结果在高精地图中的位置可以为,确定地标检测结果的位置信息,将地标检测结果的位置信息与高精地图中的位置信息相对照,得到地标检测结果在高精地图中的具体位置。
本申请实施例中,根据地标检测结果在高精地图中的位置,在高精地图中添加地标可以为,若该地标检测结果在高精地图中所在的位置上不含有该地标元素,则在高精地图中添加该地标检测结果;若该地标检测结果在高精地图中所在的位置上含有地标元素,且高精地图中的地标与本申请实施例检测的地标结果不同时,可以用本申请实施例的地标检测结果替换高精地图中的地标。
基于此,将地标检测结果更新在高精地图所在的位置后,就能够得到更为实时和准确的高精地图,能够更加准确的指导车辆的自动驾驶。
在图6对应的实施例的基础上,可能的实现方式中,还包括:根据目标检测结果确定自动驾驶策略。
本申请实施例中,自动驾驶策略可以为,指导自动驾驶车辆运行的方式。其中,高精地图可以作为自动驾驶环节中确定自动驾驶策略的重要参考依据。
示例性的,自动驾驶策略可以包括:指导自动驾驶车辆拐弯、变道、变速、为其他车辆或行人让行等其他自动驾驶策略。
示例性的,当自动驾驶车辆检测到当前场景中的目标检测结果满足可以执行车辆变道的需求时,自动驾驶车辆将基于自动驾驶场景中车辆的实时变化情况,指示自动驾驶车辆执行变道策略。若在当前自动驾驶车辆在执行变道策略的过程中,自动驾驶车辆前方存在一个的忽然减速车辆,导致自动驾驶车辆无法继续满足自身的变道需求,则自动驾驶车辆可以放弃变道策略,在本车道线内继续驾驶。
示例性的,当自动驾驶车辆检测到当前场景中的目标检测结果显示,自动驾驶车辆前方有行人正在穿越道路时,自动驾驶车辆执行让行策略,停止行驶,等待行人穿行,等到行人穿越马路后,自动驾驶车辆继续行驶。
可以理解的是,自动驾驶策略根据实际场景的不同还包括其他内容,本申请实施例中对此不做限定。
基于此,目标检测结果能够识别出当前自动驾驶场景中复杂的道路情况,自动驾驶车辆就能够基于目标检测结果更加准确的进行自动驾驶。
通过上述对本申请方案的介绍,可以理解的是,上述实现各设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件单元。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
示例性的,图8为本申请实施例提供的一种目标检测装置,该目标检测的装置包括处理器800和存储器801。
处理器800负责管理总线架构和通常的处理,存储器801可以存储处理器800在执行操作时所使用的数据。
总线架构可以包括任意数量的互联的总线和桥,具体由处理器800代表的一个或多个处理器和存储器801代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口提供接口。处理器800负责管理总线架构和通常的处理,存储器801可以存储处理器800在执行操作时所使用的数据。
本申请实施例揭示的流程,可以应用于处理器800中,或者由处理器800实现。在实现过程中,目标检测的流程的各步骤可以通过处理器800中的硬件的集成逻辑电路或者软件形式的指令完成。处理器800可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器801,处理器800读取存储器801中的信息,结合其硬件完成信号处理流程的步骤。
本申请实施例一种可选的方式,处理器800用于读取存储器801中的程序并以执行如图6所示的S601-S605中的方法流程。
图9为本申请实施例提供的一种芯片的结构示意图。芯片900包括一个或多个处理器901以及接口电路902。可选的,芯片900还可以包含总线903。其中:
处理器901可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器901中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器901可以是通用处理器、数字通信器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件、MCU、MPU、CPU或者协处理器中的一个或多个。可以实现或者执行 本申请实施例中的公开的各方法、步骤。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
接口电路902可以用于数据、指令或者信息的发送或者接收,处理器901可以利用接口电路902接收的数据、指令或者其它信息,进行加工,可以将加工完成信息通过接口电路902发送出去。
可选的,芯片还包括存储器,存储器可以包括只读存储器和随机存取存储器,并向处理器提供操作指令和数据。存储器的一部分还可以包括非易失性随机存取存储器(NVRAM)。
可选的,存储器存储了可执行软件模块或者数据结构,处理器可以通过调用存储器存储的操作指令(该操作指令可存储在操作系统中),执行相应的操作。
可选的,芯片可以使用在本申请实施例涉及的目标检测装置中。可选的,接口电路902可用于输出处理器901的执行结果。关于本申请的一个或多个实施例提供的目标检测方法可参考前述各个实施例,这里不再赘述。
需要说明的,处理器901、接口电路902各自对应的功能既可以通过硬件设计实现,也可以通过软件设计来实现,还可以通过软硬件结合的方式来实现,这里不作限制。
如图10所示,本申请实施例提供一种目标检测的装置,所述装置包括至少一个处理单元1001。
本申请实施例还提供一种电子设备,在采用对应各个功能划分各个功能模块的情况下,该电子设备包括:处理单元,用于支持目标检测装置执行上述实施例中的步骤,例如可以执行S601至S605的操作,或者本申请实施例所描述的技术的其他过程。
当然,目标检测装置包括但不限于上述所列举的单元模块。并且,上述功能单元的具体所能够实现的功能也包括但不限于上述实例所述的方法步骤对应的功能,电子设备的其他单元的详细描述可以参考其所对应方法步骤的详细描述,本申请实施例这里不予赘述。
在采用集成的单元的情况下,上述实施例中所涉及的电子设备可以包括:处理单元、存储单元和通信单元。存储单元,用于保存电子设备的程序代码和数据。该通信单元用于支持电子设备与其他网络实体的通信,以实现电子设备的通话,数据交互,Internet访问等功能。
其中,处理单元用于对电子设备的动作进行控制管理。处理单元可以是处理器或控制器。通信单元可以是收发器、RF电路或通信接口等。存储单元可以是存储器。
进一步的,该电子设备还可以包括输入单元和显示单元。显示单元可以是屏幕或显示器。输入单元可以是触摸屏,语音输入装置,或指纹传感器等。
示例性的,处理单元1001,用于利用图像数据得到点云数据;处理单元,还用于利用第一神经网络模型输出点云数据的第一三维信息;第一三维信息包括用于表示图像数据中的至少一个第一目标的至少一个第一立体框的信息,第一立体框的信息包括用于表示第一立体框的位置的第一坐标,第一立体框用于框定第一目标。
处理单元1001,还用于利用第二神经网络模型输出图像数据的二维信息;二维信息包括用于表示图像数据中的至少一个第二目标的至少一个平面框的信息,平面框的 信息包括用于表示平面框的位置的坐标;平面框用于框定第二目标。
处理单元1001,还用于根据图像数据的深度信息以及图像数据的二维信息确定第二三维信息;第二三维信息包括用于表示图像数据中的至少一个第二目标的至少一个第二立体框的信息,第二立体框的信息包括用于表示第二立体框的位置的第二坐标,第二立体框用于框定第二目标。
处理单元1001,还用于融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果;其中,在融合第一三维信息和第二三维信息中的相同目标的过程中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
在一种可能的实现方式中,处理单元,具体用于利用第三神经网络模型融合第一三维信息和第二三维信息,得到目标检测结果;其中,第三神经网络模型的损失函数中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
在一种可能的实现方式中,第三神经网络模型的损失函数与下述一项或多项相关:第一神经网络模型的置信度、第二神经网络模型的置信度、第一神经网络模型的输出结果与第一神经网络模型的真实样本的交并比、第二神经网络模型的输出结果与第二神经网络模型的真实样本的交并比、第一神经网络模型中数据的归一化数值或第二神经网络模型中数据的归一化数值。
在一种可能的实现方式中,第一三维信息包括(X 1Y 1Z 1,W 1H 1L 1),X 1Y 1Z 1为第一坐标,W 1H 1L 1表示第一立体框的长宽高;第二三维信息包括(X 2Y 2Z 2,W 2H 2L 2),X 2Y 2Z 2为第二坐标,W 2H 2L 2表示第二立体框的长宽高;损失函数loss满足下述公式:
loss=f((αX 2+(1-α)X 1),(αY 2+(1-α)Y 1),((1-β)Z 2+βZ 1),(αW 2+(1-α)W 1),(αH 2+(1-α)H 1),((1-β)L 2+βL 1)),其中0.5<α<1,0.5<β<1。
在一种可能的实现方式中,处理单元,具体用于对图像数据进行三维重建,得到点云数据。
在一种可能的实现方式中,处理单元,具体用于获取自动驾驶过程中拍摄得到的图像数据的相邻图像数据;利用图像数据和图像数据的相邻图像数据计算图像数据的深度信息;融合图像数据的深度信息和图像数据的二维信息,得到第二三维信息。
在一种可能的实现方式中,处理单元,还用于根据目标检测结果更新高精地图中的地标元素。
在一种可能的实现方式中,处理单元,具体用于确定目标检测结果中用于表示地标的地标检测结果;确定地标检测结果在高精地图中的位置;根据地标检测结果在高精地图中的位置,在高精地图中添加地标。
在一种可能的实现方式中,处理单元,还用于根据目标检测结果确定自动驾驶策略。
如图11所示,本申请提供一种车辆,装置包括至少一个摄像器1101,至少一个存储器1102,以及至少一个处理器1103。
摄像器1101,用于获取图像,图像用于得到摄像头目标检测结果。
存储器1102,用于存储一个或多个程序以及数据信息;其中一个或多个程序包括指令。
处理器1103,用于利用图像数据得到点云数据;利用第一神经网络模型输出点云数据的第一三维信息;第一三维信息包括用于表示图像数据中的至少一个第一目标的至少一个第一立体框的信息,第一立体框的信息包括用于表示第一立体框的位置的第一坐标,第一立体框用于框定第一目标;利用第二神经网络模型输出图像数据的二维信息;二维信息包括用于表示图像数据中的至少一个第二目标的至少一个平面框的信息,平面框的信息包括用于表示平面框的位置的坐标;平面框用于框定第二目标;根据图像数据的深度信息以及图像数据的二维信息确定第二三维信息;第二三维信息包括用于表示图像数据中的至少一个第二目标的至少一个第二立体框的信息,第二立体框的信息包括用于表示第二立体框的位置的第二坐标,第二立体框用于框定第二目标;融合第一三维信息和第二三维信息中的相同目标,得到目标检测结果;其中,在融合第一三维信息和第二三维信息中的相同目标的过程中,第一三维信息中用于表示图像数据深度的信息的权重大于第二三维信息中用于表示图像数据深度的信息的权重,第一三维信息中用于表示图像数据平面的信息的权重小于第二三维信息中用于表示图像数据平面的信息的权重。
在一些可能的实施方式中,本申请实施例提供的目标检测的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当程序代码在计算机设备上运行时,程序代码用于使计算机设备执行本说明书中描述的根据本申请各种示例性实施方式的目标检测的方法中的步骤。
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
根据本申请的实施方式的用于目标检测的程序产品,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在服务器设备上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被通信传输、装置或者器件使用或者与其结合使用。
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由周期网络动作系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、有线、光缆、RF等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算设备,或者,可以连接到外部计算设备。
本申请实施例针对目标检测的方法还提供一种计算设备可读存储介质,即断电后内容不丢失。该存储介质中存储软件程序,包括程序代码,当程序代码在计算设备上运行时,该软件程序在被一个或多个处理器读取并执行时可实现本申请实施例上面任何一种目标检测的方案。
以上参照示出根据本申请实施例的方法、装置(系统)和/或计算机程序产品的框图和/或流程图描述本申请。应理解,可以通过计算机程序指令来实现框图和/或流程图示图的一个块以及框图和/或流程图示图的块的组合。可以将这些计算机程序指令提供给通用计算机、专用计算机的处理器和/或其它可编程数据处理装置,以产生机器,使得经由计算机处理器和/或其它可编程数据处理装置执行的指令创建用于实现框图和/或流程图块中所指定的功能/动作的方法。
相应地,还可以用硬件和/或软件(包括固件、驻留软件、微码等)来实施本申请。更进一步地,本申请可以采取计算机可使用或计算机可读存储介质上的计算机程序产品的形式,其具有在介质中实现的计算机可使用或计算机可读程序代码,以由指令执行系统来使用或结合指令执行系统而使用。在本申请上下文中,计算机可使用或计算机可读介质可以是任意介质,其可以包含、存储、通信、传输、或传送程序,以由指令执行系统、装置或设备使用,或结合指令执行系统、装置或设备使用。
本申请结合多个流程图详细描述了多个实施例,但应理解,这些流程图及其相应的实施例的相关描述仅为便于理解而示例,不应对本申请构成任何限定。各流程图中的每一个步骤并不一定是必须要执行的,例如有些步骤是可以跳过的。并且,各个步骤的执行顺序也不是固定不变的,也不限于图中所示,各个步骤的执行顺序应以其功能和内在逻辑确定。
本申请描述的多个实施例之间可以任意组合或步骤之间相互交叉执行,各个实施例的执行顺序和各个实施例的步骤之间的执行顺序均不是固定不变的,也不限于图中所示,各个实施例的执行顺序和各个实施例的各个步骤的交叉执行顺序应以其功能和内在逻辑确定。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包括这些改动和变型在内。

Claims (22)

  1. 一种目标检测方法,其特征在于,包括:
    利用图像数据得到点云数据;
    利用第一神经网络模型输出所述点云数据的第一三维信息;所述第一三维信息包括用于表示所述图像数据中的至少一个第一目标的至少一个第一立体框的信息,所述第一立体框的信息包括用于表示所述第一立体框的位置的第一坐标,所述第一立体框用于框定所述第一目标;
    利用第二神经网络模型输出所述图像数据的二维信息;所述二维信息包括用于表示所述图像数据中的至少一个第二目标的至少一个平面框的信息,所述平面框的信息包括用于表示所述平面框的位置的坐标;所述平面框用于框定所述第二目标;
    根据所述图像数据的深度信息以及所述图像数据的二维信息确定第二三维信息;所述第二三维信息包括用于表示所述图像数据中的至少一个第二目标的至少一个第二立体框的信息,所述第二立体框的信息包括用于表示所述第二立体框的位置的第二坐标,所述第二立体框用于框定所述第二目标;
    融合所述第一三维信息和所述第二三维信息中的相同目标,得到目标检测结果;其中,在所述融合第一三维信息和所述第二三维信息中的相同目标的过程中,所述第一三维信息中用于表示图像数据深度的信息的权重大于所述第二三维信息中用于表示图像数据深度的信息的权重,所述第一三维信息中用于表示图像数据平面的信息的权重小于所述第二三维信息中用于表示图像数据平面的信息的权重。
  2. 根据权利要求1所述的方法,其特征在于,所述融合所述第一三维信息和所述第二三维信息中的相同目标,得到目标检测结果,包括:
    利用第三神经网络模型融合所述第一三维信息和所述第二三维信息,得到所述目标检测结果;其中,所述第三神经网络模型的损失函数中,所述第一三维信息中用于表示图像数据深度的信息的权重大于所述第二三维信息中用于表示图像数据深度的信息的权重,所述第一三维信息中用于表示图像数据平面的信息的权重小于所述第二三维信息中用于表示图像数据平面的信息的权重。
  3. 根据权利要求2所述的方法,其特征在于,所述第三神经网络模型的损失函数与下述一项或多项相关:所述第一神经网络模型的置信度、所述第二神经网络模型的置信度、所述第一神经网络模型的输出结果与所述第一神经网络模型的真实样本的交并比、所述第二神经网络模型的输出结果与所述第二神经网络模型的真实样本的交并比、所述第一神经网络模型中数据的归一化数值或所述第二神经网络模型中数据的归一化数值。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一三维信息包括(X 1Y 1Z 1,W 1H 1L 1),X 1Y 1Z 1为所述第一坐标,W 1H 1L 1表示所述第一立体框的长宽高;所述第二三维信息包括(X 2Y 2Z 2,W 2H 2L 2),X 2Y 2Z 2为所述第二坐标,W 2H 2L 2表示所述第二立体框的长宽高;所述损失函数loss满足下述公式:
    loss=f((αX 2+(1-α)X 1),(αY 2+(1-α)Y 1),((1-β)Z 2+βZ 1),(αW 2+(1-α)W 1),(αH 2+(1-α)H 1),((1-β)L 2+βL 1)),其中0.5<α<1,0.5<β<1。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述利用图像数据得到点云数据包括:
    对所述图像数据进行三维重建,得到所述点云数据。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述图像数据是在自动驾驶过程中拍摄的,所述根据所述图像数据的深度信息以及所述图像数据的二维信息确定第二三维信息,包括:
    获取所述自动驾驶过程中拍摄得到的所述图像数据的相邻图像数据;
    利用所述图像数据和所述图像数据的相邻图像数据计算所述图像数据的深度信息;
    融合所述图像数据的深度信息和所述图像数据的二维信息,得到所述第二三维信息。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,还包括:
    根据所述目标检测结果更新高精地图中的地标元素。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述目标检测结果更新高精地图中的地标元素,包括:
    确定所述目标检测结果中用于表示地标的地标检测结果;
    确定所述地标检测结果在高精地图中的位置;
    根据所述地标检测结果在高精地图中的位置,在所述高精地图中添加所述地标。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,还包括:
    根据所述目标检测结果确定自动驾驶策略。
  10. 一种目标检测装置,其特征在于,包括:
    处理单元,用于利用图像数据得到点云数据;
    所述处理单元,还用于利用第一神经网络模型输出所述点云数据的第一三维信息;所述第一三维信息包括用于表示所述图像数据中的至少一个第一目标的至少一个第一立体框的信息,所述第一立体框的信息包括用于表示所述第一立体框的位置的第一坐标,所述第一立体框用于框定所述第一目标;
    所述处理单元,还用于利用第二神经网络模型输出所述图像数据的二维信息;所述二维信息包括用于表示所述图像数据中的至少一个第二目标的至少一个平面框的信息,所述平面框的信息包括用于表示所述平面框的位置的坐标;所述平面框用于框定所述第二目标;
    所述处理单元,还用于根据所述图像数据的深度信息以及所述图像数据的二维信息确定第二三维信息;所述第二三维信息包括用于表示所述图像数据中的至少一个第二目标的至少一个第二立体框的信息,所述第二立体框的信息包括用于表示所述第二立体框的位置的第二坐标,所述第二立体框用于框定所述第二目标;
    所述处理单元,还用于融合所述第一三维信息和所述第二三维信息中的相同目标,得到目标检测结果;其中,在所述融合第一三维信息和所述第二三维信息中的相同目标的过程中,所述第一三维信息中用于表示图像数据深度的信息的权重大于所述第二三维信息中用于表示图像数据深度的信息的权重,所述第一三维信息中用于表示图像数据平面的信息的权重小于所述第二三维信息中用于表示图像数据平面的信息的 权重。
  11. 根据权利要求10所述的装置,其特征在于,所述处理单元,具体用于利用第三神经网络模型融合所述第一三维信息和所述第二三维信息,得到所述目标检测结果;其中,所述第三神经网络模型的损失函数中,所述第一三维信息中用于表示图像数据深度的信息的权重大于所述第二三维信息中用于表示图像数据深度的信息的权重,所述第一三维信息中用于表示图像数据平面的信息的权重小于所述第二三维信息中用于表示图像数据平面的信息的权重。
  12. 根据权利要求11所述的装置,其特征在于,所述第三神经网络模型的损失函数与下述一项或多项相关:所述第一神经网络模型的置信度、所述第二神经网络模型的置信度、所述第一神经网络模型的输出结果与所述第一神经网络模型的真实样本的交并比、所述第二神经网络模型的输出结果与所述第二神经网络模型的真实样本的交并比、所述第一神经网络模型中数据的归一化数值或所述第二神经网络模型中数据的归一化数值。
  13. 根据权利要求10或12所述的装置,其特征在于,所述第一三维信息包括(X 1Y 1Z 1,W 1H 1L 1),X 1Y 1Z 1为所述第一坐标,W 1H 1L 1表示所述第一立体框的长宽高;所述第二三维信息包括(X 2Y 2Z 2,W 2H 2L 2),X 2Y 2Z 2为所述第二坐标,W 2H 2L 2表示所述第二立体框的长宽高;所述损失函数loss满足下述公式:
    loss=f((αX 2+(1-α)X 1),(αY 2+(1-α)Y 1),((1-β)Z 2+βZ 1),(αW 2+(1-α)W 1),(αH 2+(1-α)H 1),((1-β)L 2+βL 1)),其中0.5<α<1,0.5<β<1。
  14. 根据权利要求10-13任一项所述的装置,其特征在于,所述处理单元,具体用于对所述图像数据进行三维重建,得到所述点云数据。
  15. 根据权利要求10-14任一项所述的装置,其特征在于,所述处理单元,具体用于获取所述自动驾驶过程中拍摄得到的所述图像数据的相邻图像数据;利用所述图像数据和所述图像数据的相邻图像数据计算所述图像数据的深度信息;融合所述图像数据的深度信息和所述图像数据的二维信息,得到所述第二三维信息。
  16. 根据权利要求10-15任一项所述的装置,其特征在于,所述处理单元,还用于根据所述目标检测结果更新高精地图中的地标元素。
  17. 根据权利要求16所述的装置,其特征在于,所述处理单元,具体用于确定所述目标检测结果中用于表示地标的地标检测结果;确定所述地标检测结果在高精地图中的位置;根据所述地标检测结果在高精地图中的位置,在所述高精地图中添加所述地标。
  18. 根据权利要求10-17任一项所述的装置,其特征在于,所述处理单元,还用于根据所述目标检测结果确定自动驾驶策略。
  19. 一种目标检测装置,其特征在于,包括:至少一个处理器,用于调用存储器中的程序,以执行权利要求1-9任一项所述的方法。
  20. 一种目标检测装置,其特征在于,包括:至少一个处理器和接口电路,所述接口电路用于为所述至少一个处理器提供信息输入和/或信息输出,所述至少一个处理器用于执行权利要求1-9任一项所述的方法。
  21. 一种芯片,其特征在于,包括至少一个处理器和接口;
    所述接口,用于为所述至少一个处理器提供程序指令或者数据;
    所述至少一个处理器用于执行所述程序行指令,以实现如权利要求1-9中任一项所述的方法。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有指令,当所述指令被执行时,使得计算机执行如权利要求1-9任一项所述的方法。
PCT/CN2020/130816 2020-11-23 2020-11-23 目标检测方法和装置 WO2022104774A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/130816 WO2022104774A1 (zh) 2020-11-23 2020-11-23 目标检测方法和装置
CN202080005159.6A CN112740268B (zh) 2020-11-23 2020-11-23 目标检测方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/130816 WO2022104774A1 (zh) 2020-11-23 2020-11-23 目标检测方法和装置

Publications (1)

Publication Number Publication Date
WO2022104774A1 true WO2022104774A1 (zh) 2022-05-27

Family

ID=75609557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130816 WO2022104774A1 (zh) 2020-11-23 2020-11-23 目标检测方法和装置

Country Status (2)

Country Link
CN (1) CN112740268B (zh)
WO (1) WO2022104774A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457105A (zh) * 2022-08-17 2022-12-09 北京鉴智科技有限公司 深度信息获取方法、装置、电子设备及存储介质
CN117017276A (zh) * 2023-10-08 2023-11-10 中国科学技术大学 一种基于毫米波雷达的实时人体紧密边界检测方法
CN117475389A (zh) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 一种人行横道信号灯的控制方法、系统、设备和存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256735B (zh) * 2021-06-02 2021-10-08 杭州灵西机器人智能科技有限公司 一种基于双目标定的相机标定方法和系统
CN113111978B (zh) * 2021-06-11 2021-10-01 之江实验室 一种基于点云和图像数据的三维目标检测系统和方法
CN113674287A (zh) * 2021-09-03 2021-11-19 阿波罗智能技术(北京)有限公司 高精地图的绘制方法、装置、设备以及存储介质
CN114359891B (zh) * 2021-12-08 2024-05-28 华南理工大学 一种三维车辆检测方法、系统、装置及介质
CN114312578A (zh) * 2021-12-31 2022-04-12 优跑汽车技术(上海)有限公司 车辆及其控制方法、控制装置、存储介质
CN115082886B (zh) * 2022-07-04 2023-09-29 小米汽车科技有限公司 目标检测的方法、装置、存储介质、芯片及车辆
CN115205803A (zh) * 2022-07-14 2022-10-18 安徽蔚来智驾科技有限公司 自动驾驶环境感知方法、介质及车辆
CN118379902A (zh) * 2024-06-21 2024-07-23 中电建物业管理有限公司 一种基于多目标智能识别的停车场管理方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110632608A (zh) * 2018-06-21 2019-12-31 北京京东尚科信息技术有限公司 一种基于激光点云的目标检测方法和装置
CN111311722A (zh) * 2020-01-23 2020-06-19 北京市商汤科技开发有限公司 信息处理方法及装置、电子设备和存储介质
CN111402130A (zh) * 2020-02-21 2020-07-10 华为技术有限公司 数据处理方法和数据处理装置
CN111723716A (zh) * 2020-06-11 2020-09-29 深圳地平线机器人科技有限公司 确定目标对象朝向的方法、装置、系统、介质及电子设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6945785B2 (ja) * 2016-03-14 2021-10-06 イムラ ウーロプ ソシエテ・パ・アクシオンス・シンプリフィエ 3dポイントクラウドの処理方法
CN107730503B (zh) * 2017-09-12 2020-05-26 北京航空航天大学 三维特征嵌入的图像对象部件级语义分割方法与装置
CN109241856A (zh) * 2018-08-13 2019-01-18 浙江零跑科技有限公司 一种单目车载视觉系统立体目标检测方法
CN111860060A (zh) * 2019-04-29 2020-10-30 顺丰科技有限公司 目标检测方法、装置、终端设备及计算机可读存储介质
CN110675431B (zh) * 2019-10-08 2020-09-11 中国人民解放军军事科学院国防科技创新研究院 一种融合图像和激光点云的三维多目标跟踪方法
CN111009011B (zh) * 2019-11-28 2023-09-19 深圳市镭神智能系统有限公司 车辆方向角的预测方法、装置、系统以及存储介质
CN111626217B (zh) * 2020-05-28 2023-08-22 宁波博登智能科技有限公司 一种基于二维图片和三维点云融合的目标检测和追踪方法
CN111797915A (zh) * 2020-06-24 2020-10-20 奇点汽车研发中心有限公司 目标检测方法、装置、电子设备及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110632608A (zh) * 2018-06-21 2019-12-31 北京京东尚科信息技术有限公司 一种基于激光点云的目标检测方法和装置
CN111311722A (zh) * 2020-01-23 2020-06-19 北京市商汤科技开发有限公司 信息处理方法及装置、电子设备和存储介质
CN111402130A (zh) * 2020-02-21 2020-07-10 华为技术有限公司 数据处理方法和数据处理装置
CN111723716A (zh) * 2020-06-11 2020-09-29 深圳地平线机器人科技有限公司 确定目标对象朝向的方法、装置、系统、介质及电子设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457105A (zh) * 2022-08-17 2022-12-09 北京鉴智科技有限公司 深度信息获取方法、装置、电子设备及存储介质
CN117017276A (zh) * 2023-10-08 2023-11-10 中国科学技术大学 一种基于毫米波雷达的实时人体紧密边界检测方法
CN117017276B (zh) * 2023-10-08 2024-01-12 中国科学技术大学 一种基于毫米波雷达的实时人体紧密边界检测方法
CN117475389A (zh) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 一种人行横道信号灯的控制方法、系统、设备和存储介质
CN117475389B (zh) * 2023-12-27 2024-03-15 山东海润数聚科技有限公司 一种人行横道信号灯的控制方法、系统、设备和存储介质

Also Published As

Publication number Publication date
CN112740268B (zh) 2022-06-07
CN112740268A (zh) 2021-04-30

Similar Documents

Publication Publication Date Title
WO2022104774A1 (zh) 目标检测方法和装置
US11688181B2 (en) Sensor fusion for autonomous machine applications using machine learning
US11941873B2 (en) Determining drivable free-space for autonomous vehicles
US11966838B2 (en) Behavior-guided path planning in autonomous machine applications
US11676364B2 (en) Real-time detection of lanes and boundaries by autonomous vehicles
US12051206B2 (en) Deep neural network for segmentation of road scenes and animate object instances for autonomous driving applications
US11769052B2 (en) Distance estimation to objects and free-space boundaries in autonomous machine applications
US11906660B2 (en) Object detection and classification using LiDAR range images for autonomous machine applications
WO2021160184A1 (en) Target detection method, training method, electronic device, and computer-readable medium
US20210201145A1 (en) Three-dimensional intersection structure prediction for autonomous driving applications
US20240127062A1 (en) Behavior-guided path planning in autonomous machine applications
CN113994390A (zh) 针对自主驾驶应用的使用曲线拟合的地标检测
CN112904370A (zh) 用于激光雷达感知的多视图深度神经网络
US12039663B2 (en) 3D surface structure estimation using neural networks for autonomous systems and applications
US20230135088A1 (en) 3d surface reconstruction with point cloud densification using deep neural networks for autonomous systems and applications
US12100230B2 (en) Using neural networks for 3D surface structure estimation based on real-world data for autonomous systems and applications
US20230136235A1 (en) 3d surface reconstruction with point cloud densification using artificial intelligence for autonomous systems and applications
CN114973050A (zh) 自动驾驶应用中深度神经网络感知的地面实况数据生成
CN114167404A (zh) 目标跟踪方法及装置
US20230139772A1 (en) 3d surface structure estimation using neural networks for autonomous systems and applications
JP2023066377A (ja) 自律システム及びアプリケーションのための人工知能を使用した点群高密度化を有する3d表面再構成
US20240281988A1 (en) Landmark perception for localization in autonomous systems and applications
US20240280372A1 (en) Machine learning based landmark perception for localization in autonomous systems and applications
US20240282118A1 (en) Object detection using polygons for autonomous systems and applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20962065

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20962065

Country of ref document: EP

Kind code of ref document: A1