WO2021227645A1 - 目标检测方法和装置 - Google Patents

目标检测方法和装置 Download PDF

Info

Publication number
WO2021227645A1
WO2021227645A1 PCT/CN2021/081090 CN2021081090W WO2021227645A1 WO 2021227645 A1 WO2021227645 A1 WO 2021227645A1 CN 2021081090 W CN2021081090 W CN 2021081090W WO 2021227645 A1 WO2021227645 A1 WO 2021227645A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target detection
location area
category
target
Prior art date
Application number
PCT/CN2021/081090
Other languages
English (en)
French (fr)
Inventor
尹晓萌
苏惠荞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21803599.6A priority Critical patent/EP4141737A4/en
Publication of WO2021227645A1 publication Critical patent/WO2021227645A1/zh
Priority to US17/985,479 priority patent/US20230072730A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence technology, and in particular to a target detection method and device.
  • AI artificial intelligence
  • machine learning methods are usually used to build initial models of various structures, such as neural network models, support vector machine models, and decision tree models. Then, various initial models are trained to achieve purposes such as image recognition and natural language processing.
  • image recognition also includes recognizing the characters presented in the image and performing target detection on each object presented in the image.
  • the probability of recognizing small targets in the image can be reduced, which is beneficial to improve the detection accuracy of target detection.
  • an embodiment of the present application provides a target detection method, which is applied to an electronic device, and the target detection method includes: acquiring an image using a camera device; and calibrating based on the parameters of the camera device and a preset driving path Extract the region of interest in the image; use a target detection algorithm to detect the image to obtain the category to which the target object in the image belongs, the first location area of the target object in the image, and the The confidence of the category to which the target object belongs; based on the relative position relationship between the first location area and the region of interest, the confidence of the category to which the target object belongs is corrected to obtain the first confidence.
  • the target detection method further includes: based on the parameters of the camera, The boundary coordinates of the first location area in the image, the preset size of the object corresponding to the category in the real world, determine the second location area in the image; detect the first location area and the The error between the second location areas; based on the error, the first confidence is corrected to obtain the second confidence of the category to which the target belongs.
  • the error between the second location area and the first location area By using the error between the second location area and the first location area to correct the first confidence level, it is possible to further reduce the misjudgment of the target object on the road (for example, misjudge the trees in the far road as pedestrians). Further improve the accuracy of target object detection.
  • the parameters of the camera device include at least one of the following: the focal length of the camera device, the distance between the camera device and a reference surface, and the conversion from the camera device coordinate system to the image coordinate system The conversion matrix and the size of the photosensitive cell in the photosensitive element.
  • the second location area includes: based on the focal length of the imaging device, the distance between the imaging device and the reference surface, the conversion matrix converted from the imaging device coordinate system to the image coordinate system, and the size of the photosensitive unit in the photosensitive element And the boundary coordinates of the first location area in the image to determine the distance between the camera device and the target object; the object corresponding to the detected category based on the distance between the camera device and the target object
  • the size in the real world, the distance between the camera device and the reference surface, and the boundary coordinates of the first location area determine the second location area in the image.
  • the category to which the target object belongs is to match the characteristics of the target object with the characteristics of objects corresponding to multiple preset candidate categories, and based on the matching result, from the preset candidate categories Selected from.
  • the above-mentioned target detection algorithm may be a pre-trained target detection model.
  • the image is detected using a target detection algorithm to obtain the category to which the target object in the image belongs, the first location area of the target object in the image, and the category to which the target object belongs
  • the confidence level includes: setting calibration parameters in a pre-trained target detection model, where the calibration parameters are used to instruct the target detection model to calibrate multiple candidate regions in the image; and input the image to the target A detection model to obtain an output result of the target detection model, and the output result is used to indicate whether an object of a preset candidate category is present in each of the candidate regions and the confidence level of the category to which the target object belongs, wherein
  • the target detection model is obtained by training a neural network based on training samples and calibration parameters used for calibration of candidate regions.
  • multiple candidate regions in the image are predetermined based on constraint conditions;
  • the constraint conditions include: objects corresponding to each of the preset candidate categories are presented in all The area range in the image and the imaging size range of the object corresponding to each of the preset candidate categories in the image.
  • the determining multiple candidate regions in the image includes: marking an initial candidate region in the image; using the constraint condition to screen the initial candidate region, based on The screening results are used to obtain the multiple candidate regions.
  • the method further includes an optimization step for the target detection model, and the optimization step includes: obtaining a training sample set, the training sample set includes a plurality of sample images, each of the samples The target object is presented in the image; the sample image is input to the target detection model, and the category of the target object in each sample image and the first location area of the target object in the sample image are obtained based on the target object in the sample image.
  • the detection accuracy of the target detection model can be further improved, that is, the accuracy of road target detection can be improved, so as to provide guarantee for subsequent automatic driving vehicles to detect and avoid obstacles.
  • an embodiment of the present application provides a target detection device.
  • the target detection device includes: an acquisition module for acquiring an image using a camera device; a calibration module for obtaining an image based on the parameters of the camera device and a preset driving path, Calibrate the region of interest in the image; the first detection module is used to detect the image using a target detection algorithm to obtain the category to which the target object in the image belongs, and the target object in the image The first location area and the confidence level of the category to which the target object belongs; a first correction module, configured to correct the category to which the target object belongs based on the relative position relationship between the first location area and the region of interest The confidence of the category, get the first confidence.
  • the target detection device further includes: a determining module, configured to respond to the first confidence level being greater than a preset threshold, based on the parameters of the camera device, the The boundary coordinates of the first location area in the image, the preset size of the object corresponding to the category in the real world, determine the second location area in the image; the second detection module is used to detect the first The error between the location area and the second location area; a second correction module, configured to correct the first confidence based on the error to obtain the second confidence of the category to which the target belongs.
  • the parameters of the camera device include at least one of the following: the focal length of the camera device, the distance between the camera device and a reference surface, and the coordinate system of the camera device The conversion matrix converted into the image coordinate system and the size of the photosensitive cell in the photosensitive element.
  • the determining module includes: a first determining sub-module, configured to determine the distance between the camera device and the reference surface based on the focal length of the camera device, The device coordinate system is converted into a conversion matrix of the image coordinate system, the size of the photosensitive unit in the photosensitive element, and the boundary coordinates of the first position area in the image to determine the distance between the imaging device and the target object;
  • the second determining sub-module is used to determine the size of the object corresponding to the detected category in the real world based on the distance between the camera device and the target object, the distance between the camera device and the reference surface, and the first The boundary coordinates of the location area determine the second location area in the image.
  • the category to which the target object belongs is to match the characteristics of the target object with the characteristics of the objects corresponding to multiple preset candidate categories. Selected from the preset candidate categories.
  • the first detection module includes: a setting sub-module for setting calibration parameters in a pre-trained target detection model, and the calibration parameters are used to indicate the target
  • the detection model calibrates multiple candidate regions in the image; the detection sub-module is used to input the image into the target detection model to obtain the output result of the target detection model, and the output result is used to indicate each Whether an object of a preset candidate category is present in the candidate area and the confidence level of the category to which the target object belongs, wherein the target detection model is based on training samples and calibration parameters used to calibrate the candidate area. Obtained from network training.
  • multiple candidate regions in the image are predetermined based on constraint conditions;
  • the constraint conditions include: objects corresponding to each of the preset candidate categories are presented in all The area range in the image and the imaging size range of the object corresponding to each of the preset candidate categories in the image.
  • the setting submodule is specifically configured to: mark an initial candidate area in the image; use the constraint condition to screen the initial candidate area, based on the filtering As a result, the multiple candidate regions are obtained.
  • the target detection device further includes a model optimization module, and the model optimization module is specifically configured to: obtain a training sample set, the training sample set including a plurality of sample images, The target object is presented in each of the sample images; the sample image is input to the target detection model to obtain the category of the target object in each sample image and the first location area of the target object in the sample image, based on the sample image
  • the category of the target object belongs to, the boundary coordinates of the first set area, and the parameters of the shooting device used to shoot the sample images to determine the second location area in each sample image; use the preset loss function to determine the first location in each training sample
  • the target detection model is iteratively adjusted based on the deviation to obtain an optimized target detection model.
  • an embodiment of the present application provides an electronic device that includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and is characterized in that the processing When the computer program is executed by the computer program, the electronic device realizes the method as described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium stores instructions. When the instructions are run on a computer, they are used to execute the method described in the first aspect.
  • embodiments of the present application provide a computer program or computer program product, which when the computer program or computer program product is executed on a computer, causes the computer to execute the method described in the first aspect.
  • FIG. 1 is a schematic diagram of the hardware structure of an application scenario applied to an embodiment of the present application provided by an embodiment of the present application;
  • FIG. 2 is a schematic diagram of misjudgment of an object presented in an image in the prior art according to an embodiment of the present application;
  • FIG. 3 is a schematic flowchart of a target detection method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of negative samples provided during the training process of the target detection model provided by the embodiment of the present application.
  • 5a-5e are schematic diagrams of an application scenario of the target detection method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another target detection method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another application scenario of the target detection method provided by an embodiment of the present application.
  • FIG. 8a-8b are schematic diagrams of a method for determining a second location area in the application scenario shown in FIG. 7 according to an embodiment of the present application;
  • Fig. 9 is a schematic diagram of a target detection device provided by an embodiment of the present application.
  • the corresponding device may include one or more units such as functional units to perform the described one or more method steps (for example, one unit performs one or more steps) , Or multiple units, each of which performs one or more of multiple steps), even if such one or more units are not explicitly described or illustrated in the drawings.
  • the corresponding method may include one step to perform the functionality of one or more units (for example, one step executes one or more units). The functionality, or multiple steps, each of which performs the functionality of one or more of the multiple units), even if such one or more steps are not explicitly described or illustrated in the drawings.
  • the target detection method described in this application can be applied to the field of image recognition and various scenes where small targets in images need to be detected and recognized.
  • the following takes the detection of small targets on the road in an autonomous driving scene as an example to describe this application in detail.
  • FIG. 1 is a schematic structural diagram of a vehicle in an embodiment of the application.
  • the components coupled to or included in the vehicle 100 may include a propulsion system 110, a sensor system 120, a control system 130, peripheral equipment 140, a power supply 101, a computing device 107, and a user interface 108.
  • the computing device 107 includes a processor 102, a transceiver 103, and a memory 104.
  • the computing device 107 may be a controller or a part of the controller of the vehicle 100.
  • the memory 104 includes instructions 106 that the processor 102 can run, and can also store map data 105.
  • the components of the vehicle 100 may be configured to work in a manner interconnected with each other and/or with other components coupled to various systems.
  • the power supply 101 may provide power to all components of the vehicle 100.
  • the computing device 107 may be configured to receive data from the propulsion system 110, the sensor system 120, the control system 130, and the peripheral device 140 and control them.
  • the computing device 107 may be configured to generate a display of images on the user interface 108 and receive input from the user interface 108.
  • the vehicle 100 may also include more, fewer, or different systems, and each system may include more, fewer, or different components.
  • each system may include more, fewer, or different components.
  • the systems and components shown can be combined or divided in any manner, which is not specifically limited in the embodiment of the present application.
  • the propulsion system 102 described above can be used for the vehicle 100 to provide power movement. Still referring to FIG. 1, the propulsion system 102 may include an engine/motor 114, an energy source 113, a transmission 112, and wheels/tires 111. Of course, the propulsion system 102 may additionally or alternatively include other components in addition to the components shown in FIG. 1, which is not specifically limited in the embodiment of the present application.
  • the sensor system 104 may include several sensors for sensing information about the environment in which the vehicle 100 is located. As shown in the figure, the sensors of the sensor system include a global positioning system GPS126, an inertial measurement unit (IMU) 125, a lidar sensor 124, a vision sensor 123, a millimeter wave radar sensor 122, and sensors for modifying the position and /Or at least one of the actuators 121 facing.
  • the sensor system 120 may also include additional sensors, including, for example, a sensor that monitors the internal system of the vehicle 100 (for example, at least one of an O 2 monitor, a fuel gauge, an oil temperature, etc.).
  • the sensor system 120 may also include other sensors.
  • the global positioning system (GPS) module 126 may be any sensor used to estimate the geographic location of the vehicle 100.
  • the GPS module 126 may include a transceiver to estimate the position of the vehicle 100 relative to the earth based on satellite positioning data.
  • the computing device 107 can be used in conjunction with the map data 105 to use the GPS module 126 to estimate the position of the lane boundary on the road on which the vehicle 100 can travel.
  • the GPS module 126 may also take other forms.
  • the IMU 125 may be used to sense the position and orientation change of the vehicle 100 based on the inertial acceleration and any combination thereof.
  • the combination of sensors may include, for example, an accelerometer and a gyroscope. Other combinations of sensors are also possible.
  • a LiDAR sensor (light detection and ranging, LiDAR) 124 can be regarded as an object detection system, which uses light to sense or detect objects in the environment where the vehicle 100 is located.
  • LIDAR 124 is an optical remote sensing technology that can measure the distance to the target or other attributes of the target by illuminating the target with light.
  • LIDAR 124 may include a laser source and/or laser scanner configured to emit laser pulses, and a detector for receiving reflections of laser pulses.
  • the LIDAR 124 may include a laser rangefinder reflected by a rotating mirror, and scan laser light around the digitized scene in one or two dimensions, so as to collect distance measurement values at specified angular intervals.
  • the LIDAR 124 may include components such as a light source (for example, a laser), a scanner and optical system, a light detector and receiver electronics, and a position and navigation system. LIDAR 124 determines the distance of an object by scanning the laser light reflected from an object, and can form a three-dimensions (3D) environment map with an accuracy of up to centimeters.
  • the visual sensor 123 may be used for any camera (for example, a static camera, a video camera, etc.) that obtains an image of the environment in which the vehicle 100 is located. To this end, the visual sensor 123 may be configured to detect visible light, or may be configured to detect light from other parts of the spectrum, such as infrared light or ultraviolet light. Other types of vision sensors are also possible. The vision sensor 123 may be a two-dimensional detector, or a detector having a three-dimensional spatial range. In some possible implementations, the visual sensor 123 may be, for example, a distance detector, which is configured to generate a two-dimensional image indicating the distance from the visual sensor 123 to several points in the environment. To this end, the visual sensor 123 may use one or more distance detection technologies.
  • a distance detector which is configured to generate a two-dimensional image indicating the distance from the visual sensor 123 to several points in the environment. To this end, the visual sensor 123 may use one or more distance detection technologies.
  • the vision sensor 123 may be configured to use structured light technology, in which the vehicle 100 uses a predetermined light pattern, such as a grid or checkerboard pattern, to illuminate an object in the environment, and uses the vision sensor 123 to detect a predetermined light pattern from the object. Reflection. Based on the distortion in the reflected light pattern, the vehicle 100 may be configured to detect the distance of the point on the object.
  • the predetermined light pattern may include infrared light or light of other wavelengths.
  • the millimeter-wave radar sensor 122 generally refers to an object detection sensor with a wavelength of 1 to 10 mm, and the approximate frequency range is 10 GHz to 200 GHz.
  • the measured value of millimeter wave radar has depth information and can provide the distance of the target; secondly, because millimeter wave radar has obvious Doppler effect and is very sensitive to speed, the speed of the target can be obtained directly, and the Doppler frequency shift can be detected. Extract the speed of the target.
  • the two mainstream automotive millimeter wave radar application frequency bands are 24GHz and 77GHz respectively.
  • the wavelength of the former is about 1.25cm, which is mainly used for short-distance sensing, such as the surrounding environment of the car body, blind spots, parking assistance, lane change assistance, etc.; the latter wavelength About 4mm, used for medium and long distance measurement, such as automatic car following, adaptive cruise control (ACC), emergency braking (autonomous emergency braking, AEB), etc.
  • ACC adaptive cruise control
  • AEB autonomous emergency braking
  • the control system 130 may be configured to control the operation of the vehicle 100 and its components.
  • the control system 130 may include a steering unit 136, a throttle 135, a braking unit 134, a sensor fusion unit 133, a computer vision system 132, and a navigation or pathing system 131.
  • the control system 130 may additionally or alternatively include other components in addition to the components shown in FIG. 1, which is not specifically limited in the embodiment of the present application.
  • the peripheral device 140 may be configured to allow the vehicle 100 to interact with external sensors, other vehicles, and/or users.
  • the peripheral device 140 may include, for example, a wireless communication system 144, a touch screen 143, a microphone 142, and/or a speaker 141.
  • the peripheral device 140 may additionally or alternatively include other components in addition to the components shown in FIG. 1, which is not specifically limited in the embodiment of the present application.
  • the power supply 101 may be configured to provide power to some or all of the components of the vehicle 100.
  • the power source 110 may include, for example, a rechargeable lithium ion or lead-acid battery.
  • one or more battery packs may be configured to provide power.
  • Other power supply materials and configurations are also possible.
  • the power supply 110 and the energy source 113 may be implemented together.
  • the processor 102 included in the computing device 107 may include one or more general-purpose processors and/or one or more special-purpose processors (for example, image processors, digital signal processors, etc.). As far as the processor 102 includes more than one processor, the processors can work alone or in combination at this time.
  • the computing device 107 may implement the function of controlling the vehicle 100 based on the input received through the user interface 108.
  • the transceiver 103 is used for communication between the computing device 107 and various systems.
  • the memory 104 may further include one or more volatile storage components and/or one or more non-volatile storage components, such as optical, magnetic, and/or organic storage devices, and the memory 104 may be fully or partially connected to the processor 102 integrated.
  • the memory 104 may contain instructions 106 (for example, program logic) executable by the processor 102 to execute various vehicle functions, including any of the functions or methods described in the embodiments of the present application.
  • the components of the vehicle 100 may be configured to work in a manner interconnected with other components inside and/or outside of their respective systems. To this end, the components and systems of the vehicle 100 may be connected together through a system bus, a network, and/or other connection mechanisms.
  • a target detection algorithm is usually used to detect targets on the road in real time to ensure the safety of the vehicle. For example, through target detection, the vehicle can be notified of the driving area and the location of obstacles can be marked, thereby assisting the vehicle in avoiding obstacles.
  • the computing device trains a neural network that can recognize specific types of objects through deep learning.
  • specific types of objects can be common target objects such as pedestrians, vehicles, trees, houses, and road facilities.
  • the computing device can recognize the above-mentioned specific category of objects through the neural network. Since the neural network learns the characteristics of the above-mentioned specific types of objects, when some similar characteristics appear in the image, it is usually impossible to perform effective recognition, and it is easy to cause misjudgment.
  • a pedestrian identification is presented on the sign, and the pedestrian is not an actual pedestrian object.
  • the characteristics of the pedestrian signs displayed on the sign are usually similar to those of pedestrians on the far road, causing the neural network to misjudge the pedestrian signs displayed on the sign as small target pedestrians on the road, such as the one shown in Figure 2.
  • the probability that the neural network judges that the pedestrian on the sign shown in Figure 2 is a pedestrian is 0.85, which reduces the accuracy of target detection.
  • embodiments of the present application provide a target detection method, which can be applied to a target detection device.
  • the target detection device may be the computing device described in the foregoing embodiment or a part of the computing device.
  • FIG. 3 is a schematic flowchart of a target detection method shown in an embodiment of the application. As shown in Figure 3, the method includes:
  • S301 Acquire an image by using a camera device.
  • the camera here is the visual sensor in the above-mentioned sensor system, which is used to collect images of the road in front of the vehicle body.
  • the image may include objects such as pedestrians, vehicles, roads, barriers, etc., of course, may also include sidewalks, sidewalk trees, traffic lights, etc., which are not specifically limited in the embodiment of the present application.
  • the camera device may be a monocular camera, and the monocular camera captures an image to be processed at a time.
  • the camera device may also include a multi-lens camera, and these cameras may be physically combined in one camera device, or physically separated in multiple camera devices. Multiple images are captured at the same time through a multi-lens camera, and can be processed based on these images to obtain an image to be recognized.
  • the imaging device may also be in other situations, which are not specifically limited in the embodiment of the present application.
  • the camera device can collect images in real time, or can collect images periodically. The period is 3s, 5s, 10s, etc.
  • the camera device may also collect images in other ways, which is not specifically limited in the embodiment of the present application.
  • the image can be transferred to the above-mentioned target detection device.
  • the target detection device can obtain the image. It should be noted here that S301 may be executed after the vehicle is started, or after the vehicle starts the automatic driving function.
  • Step S302 Based on the parameters of the camera device and the preset driving route, a region of interest (ROI) in the image is calibrated.
  • ROI region of interest
  • the image acquired by the above-mentioned camera device is usually a road condition image.
  • the image usually presents objects in front of and on both sides of the road.
  • objects in front of and on both sides of the road For example, vehicles on the front or side of the road, pedestrians, trees on both sides of the road, tires in the lane, wooden boxes, etc.
  • the region of interest in the image can be the road ahead and the roads on both sides.
  • the ROI can be an area outlined by a box, circle, ellipse, or irregular polygon. As shown in Figure 5b, it schematically shows the ROI in the image.
  • the ROI can be determined by the parameters of the camera device and the preset driving route.
  • the parameters of the camera device include, but are limited to: the height between the camera device and the reference plane, and the location where the camera device is installed on the vehicle; the preset driving path includes, but is not limited to: the lane where the vehicle is located, and the vehicle driving rules (such as Drive left or keep right).
  • the target detection device After the above-mentioned target detection device acquires the image, it can calibrate the region of interest in the image based on the parameters of the camera device and the preset driving route.
  • Step S303 Use a target detection algorithm to detect the acquired image to obtain the category to which the target object in the image belongs, the location area of the target object in the image, and the confidence of the category to which the target object belongs.
  • Step S303 can be specifically implemented through the following steps:
  • Step 1 Set calibration parameters in the target detection model.
  • the calibration parameters are used to instruct the target detection model to calibrate multiple candidate regions in the image.
  • the calibration parameter is the position parameter of each candidate area in the image (for example, the position coordinate in the image) of each of the plurality of candidate areas.
  • the position of the candidate area in the image is predetermined. Determining the candidate area here may also be referred to as setting a priori anchor.
  • various existing methods such as manual setting, K-means clustering, RPN (region proposal network) algorithm, or selective search algorithm can be used to determine multiple candidate regions. Then, the position parameters of the determined candidate regions in the image are set in the target detection model.
  • the method of determining the candidate area can be implemented in the following two ways:
  • the image is equally divided into multiple units. Then, for each equally divided unit in the image, multiple a priori frames with different aspect ratios are set, and the difference in the aspect ratio of the multiple a priori frames may be an arithmetic sequence.
  • the multiple pre-verification frames set are also multiple candidate regions.
  • the image is segmented using a segmentation method to obtain a set of candidate area images of multiple image blocks after the image segmentation. Calculate the similarity of every two adjacent regions in the candidate region image set (for example, it can be determined from multiple dimensions such as color similarity, texture similarity, etc.), and finally merge the regions with similarity higher than a preset threshold to determine The final candidate area.
  • Step 2 Input the image into a pre-trained target detection model that has been calibrated with parameter settings, so that the target detection model detects the confidence of the target objects in the preset candidate categories in each candidate area and the positioning deviation of the candidate area.
  • the confidence level is high, and when the confidence level exceeds the threshold, it is predicted that the one or several regions present a predetermined level of confidence.
  • the positioning deviation of the candidate region is determined by the target detection model's edge detection of the candidate region and the detection of a fully connected layer (or a fully convolutional layer) of multiple candidate regions.
  • the target detection model can adjust the position of the candidate area in the image through the positioning deviation of the candidate area. Then, the target detection model outputs the position information in the image of the candidate area presenting the target object, the category to which the target object presented in the candidate area belongs, and the confidence level of the category to which the target object belongs.
  • the preset candidate category may be obstacles on the road, including but not limited to pedestrians, vehicles, trees, tires, boxes, construction signs, etc.
  • the position information in the image of the candidate area presenting the target object output by the target detection model is the position information after re-adjusting the position in the image based on the positioning deviation of the candidate area.
  • the candidate area described in the first step may also be determined based on constraint conditions.
  • the above-mentioned selective search algorithm, manual setting, RPN or K-means clustering methods can be used to mark the initial candidate area in the image; then the initial candidate area can be screened using the constraint conditions to obtain the final candidate area.
  • the constraint conditions specifically refer to: the area range of each object in the preset candidate category presented in the image, and the imaging size range of each object in the preset candidate category in the image.
  • the range of the area where each object in the preset candidate category appears in the image is determined based on the location area where each object may appear in the real world.
  • each object can be placed in The location area that may appear in the real world is mapped to the image, and the mapped area in the image is the range of the area where the object appears in the image.
  • the imaging size range of each object in the preset candidate category in the image is determined based on the distance between the target object and the camera, and the height and width of each object in the real world. At different distances, the imaging sizes of objects of the same category in the images captured by the same camera are not the same. In practice, it can be based on the focal length of the imaging device, the size of the photosensitive unit in the photosensitive element, the optical center parameters of the imaging device, the position coordinates of the road surface in the world coordinate system, the height between the imaging device and the reference surface, and the location of each object. The height and width in the real world determine the size of each object in the image at different distances, and then determine the imaging size range of each object in the image.
  • the above constraints include: the possible position of the pedestrian is a road area, and the image size range of the pedestrian in the image is a-b, and a is smaller than b.
  • the image size in the image is a
  • the image size of the image is b
  • 50 meters to 300 meters can be considered as the camera can shoot The distance range.
  • the imaging size of the pedestrian in the image is based on the height and width of the pedestrian in the real world, the distance between the pedestrian and the camera, the focal length of the camera, the size of the photosensitive unit in the photosensitive element, the optical center parameters of the camera, and the camera Determined by the height between the reference surface.
  • the area range of the road surface presented in the image can be calibrated, which is recorded as the road surface range. Then, from the initial candidate areas, the candidate areas located in the range of the road surface and the size between a-b are selected. In this way, the selected area is regarded as the final candidate area.
  • the above-mentioned reference plane may be a horizontal plane.
  • the target detection model described in the second step is obtained by training the neural network based on the training sample set and the preset loss function.
  • the training sample set includes sample images and label information for the sample images.
  • the target object in the sample image here is a small road target object, that is, the target object occupies a small position in the image (for example, less than a certain preset area threshold).
  • Sample images include positive sample images and negative sample images.
  • Negative sample images include images that have the shape or outline of one of the preset candidate categories but belong to other categories, and the other categories may belong to the preset candidate categories. The other one may not belong to any of the preset candidate categories. For example, an image of a tree with a pedestrian silhouette in a local location, an image of a trash can with a puppy silhouette, or an image showing a pedestrian sign as shown in FIG. 4.
  • the annotation information is used to indicate the category to which the object in the positive sample image belongs and the position of the object in the sample image.
  • the annotation information is also used to indicate the category to which the object presented in the negative sample image belongs or does not belong to any one of the preset candidate categories, and the position of the object presented in the image in the sample image.
  • the preset candidate categories include two categories: pedestrians and trees. Among them, pedestrians are represented by 1, trees are represented by 2, and those that do not belong to any category are represented by 0.
  • the label information for the negative sample shown in Figure 4 is 0, and the location area (a, b, c, d). a, b, c, d are the image coordinates of the four vertices of the rectangular frame shown in FIG. 4 in FIG. 4 respectively.
  • the candidate region is obtained, and then the position parameter of the candidate region in the image is set in the neural network.
  • the sample image is input to the neural network that has been parameterized, and random output information is obtained.
  • the random output information includes the category corresponding to the object presented in the sample image, the position area in the image, and the confidence level of the category corresponding to the presented object.
  • the preset loss function is used to calculate the deviation between the output information and the label information, and based on the deviation, the weight parameters of the neural network are iteratively adjusted to obtain the above-mentioned target detection model.
  • the preset loss function here may include, but is not limited to: a mean square error function and so on.
  • the above-mentioned annotation information may include the confidence of the annotation, and the deviation between the calculated output information and the annotation information herein may refer to the deviation between the confidence of the category corresponding to the presented object and the confidence of the annotation.
  • the aforementioned neural network may include a convolutional layer, a pooling layer, a hidden layer, a fully connected layer, and so on.
  • the number of layers can be determined according to the number of categories to be recognized, the number of target categories presented in each image, and the number of pixels in the image.
  • Step S304 Based on the determined relative position relationship between the location area and the region of interest, the confidence of the category to which the target object belongs is corrected to obtain a first confidence.
  • the first confidence can be set to a lower confidence value, such as 0, 0.1, and so on. In other words, the probability that the target object belongs to this category is very low.
  • the first confidence can be set to a higher confidence value, such as 0.8, 0.9, 1, etc.
  • the pedestrian when the detected target object is a pedestrian, the pedestrian usually has contact with the ground, whether it is foot contact with the ground, or using vehicles such as motorcycles or bicycles to drive on the ground.
  • the ROI in the image is the ground. Then, it is determined whether the lower boundary of the location area where the portrait is located is within the ground range, so as to determine whether the person presented in the image is in contact with the ground.
  • a confidence level sets a lower confidence level.
  • the detected target object can be further verified Whether the category belongs to is correct, filter out some unreasonable or unreasonable category detection results, so as to improve the accuracy of target detection.
  • the image A as shown in FIG. 5a is acquired by the camera device.
  • the target detection device can determine the region of interest in the image A.
  • the region of interest in image A can be a ground-range region, as shown in Figure 5b.
  • the target detection device installed in the self-driving vehicle can record the boundary coordinates of the region of interest in the image A.
  • FIG. 5c a plurality of candidate region images are calibrated in the image to obtain image B, as shown in FIG. 5c.
  • image B there are a large number of rectangular boxes distributed on image B, and each rectangular box is a candidate area.
  • FIG. 5c is schematic. In actual applications, more or fewer rectangular boxes are included, and the size of each rectangular box can also be determined according to the needs of the application scenario.
  • the image B is input to a pre-trained target detection model, so as to determine whether a target object of a preset candidate category is present in each candidate area, as shown in FIG. 5d.
  • Figure 5d shows the detection result output by the target detection model. It can be seen from Figure 5d that the target detection model detects that the target object presented in candidate area a is a pedestrian with a probability of 0.85; the target object presented in candidate area b is a tree with a probability of 0.9, and the target object presented in candidate area c Is a pedestrian, and the probability is 0.7.
  • the candidate area a, candidate area b, and candidate area c are the location areas detected by the target detection model.
  • the target detection device can compare the image coordinates of each candidate region image with the boundary coordinates of the region of interest to determine whether the image coordinates of each candidate region image are within the range of the region of interest.
  • the candidate area c is not located within the region of interest of the image.
  • the target detection device can determine that the object presented by the candidate region image c is a pedestrian with a confidence level of 0.1.
  • FIG. 6 shows a flowchart of another embodiment of the target detection method provided by the present application.
  • the target detection method includes:
  • S601 Acquire an image by using a camera device.
  • S603 Detect the acquired image by using a target detection algorithm to obtain the category to which the target object in the image belongs, the location area of the target object in the image, and the confidence of the category to which the target object belongs.
  • S604 Based on the determined relative position relationship between the location area and the region of interest, correct the confidence of the category to which the target object belongs to obtain a first confidence.
  • step S601 to step S604 For the specific implementation of step S601 to step S604 and the beneficial effects brought by it, refer to the related description of step S301 to step S304 in the embodiment shown in FIG. 3, which will not be repeated here.
  • step S605 Detect whether the first confidence of the category of the target object determined in step S604 is greater than a preset threshold. When it is greater than the preset threshold, step S606 to step S608 are executed. When the value is less than or equal to the preset threshold, the value of the first confidence level is output.
  • S606 Determine the second location area in the image based on the parameters of the camera device, the boundary coordinates of the first location area in the image, and the size of the object corresponding to the preset category in the real world.
  • the parameters of the imaging device here specifically include but are not limited to: the focal length of the imaging device, the size of the photosensitive unit in the photosensitive element, the conversion matrix of the imaging device, the distance between the imaging device and the reference surface, and the optical center parameters of the imaging device.
  • the reference surface here can be the ground.
  • Step S606 can be specifically implemented through the following steps.
  • Step 1 Based on the focal length of the camera device, the distance between the camera device and the reference surface, the conversion matrix converted from the camera device coordinate system to the image coordinate system, the size of the photosensitive unit in the photosensitive element, and the boundary of the first location area in the image Coordinates, determine the distance between the camera device and the target object.
  • the size of the object corresponding to the category in the real world is queried.
  • the height of pedestrians on the road is usually between 130cm-190cm
  • the lateral width is usually between 43cm-55cm.
  • P w [X w Y w Z w 1] T is the coordinates of the target in the world coordinate system
  • K is the conversion matrix of the camera
  • T is the translation matrix from the world coordinate system to the camera coordinate system, where the height of the camera is set in the matrix
  • I is the unit diagonal matrix
  • Z w represents The distance between the camera and the target object.
  • formula (2) is a formula for determining the distance between the camera device and the target object obtained after refining and deriving formula (1), which is applied in the embodiment of the present application.
  • formula (2) that is, according to the size of the target in the real world and the size of the target in the imaging plane, the distance Zw between the camera device and the target object is determined.
  • f x , f y are the focal lengths of the imaging device in the x and y axis directions respectively
  • d x , d y are the size of the photosensitive unit in the x and y axis directions on the photosensitive element respectively
  • w w , h w are respectively the target in the real
  • the width and height in the world, w and h are the width and height of the image
  • Z w is the distance between the target object and the camera.
  • Step 2 Based on the distance between the camera and the target object, the size of the object corresponding to the detected category in the real world, the distance between the camera and the reference surface, and the boundary coordinates of the first location area, based on the road size The assumption that the target is on the road plane is constrained to determine the second location area in the image.
  • the boundary coordinates of the first location area may be the lower boundary coordinates of the first area. It can include multiple coordinate points or one coordinate point. Wherein, when the lower boundary coordinates of the first location area include a coordinate point, the coordinate point may be the midpoint of the lower boundary, or may be a vertex where the lower boundary intersects with other boundaries (for example, the left boundary or the right boundary).
  • formula (3) can be used to infer the position in the image of the object of the category determined in step S603, which is also the second position area.
  • formula (3) can determine the height of the object of this category in the image, and the height is the height along the direction of gravity; then, based on the width-height ratio of the object of this category in the real world, the height of the object can be determined. The width of the category's object in the image.
  • the second location area is based on the lower boundary of the first location area as the bottom edge (for example, the midpoint of the lower boundary of the first location area is the midpoint of the bottom edge, or one of the lower boundaries of the first location area)
  • the vertex is used as the first starting point of the bottom side
  • the width determined by formula (3) is used as the width of the left and right borders of the second location area
  • the determined height is used as the lower and upper boundaries of the second location area. Therefore, the specific coordinate range of the second location area in the image can be determined.
  • (u, v) are the coordinates of a certain fixed point in the first location area in the image (for example, the midpoint of the lower boundary, the vertex of the lower boundary) in the image coordinate system, (X w Y w Z w ) It is the coordinates of a certain point of the target object in reality (such as the contact point of pedestrian feet and the ground, the contact point of car wheels and the ground) in the world coordinate system; f x , f y are respectively in the camera coordinate system, the camera x
  • the focal length in the axis and y-axis directions, d x , d y are the size of the photosensitive unit in the x-axis and y-axis directions on the photosensitive element in the coordinate system of the imaging device, respectively.
  • u 0 , v 0 is the center of the image plane (the pixel coordinates of the image center)
  • h w is the height of the target in the real world
  • h com is the distance between the camera plane and the reference plane
  • It is the height of the image derived from the reference distance and the target height.
  • the detection of the error between the first location area and the second location area can detect the error between the height of the first location area and the height of the second location area, or the difference between the width of the first location area and the second location area.
  • the error between the widths, or the error between the ratio of the width to the height of the first location area and the ratio of the width to the height of the second location area is detected.
  • step S603 By determining the error between the first location area and the second location area, it can be deduced whether the category of the target object detected in step S603 is accurate.
  • the error between the first location area and the second location area is greater than the preset threshold, it can be considered that the category of the target object detected in step S603 is not credible; for example, when it is determined in step S603 that the target object in the first location area It is a person, and the upper and lower boundaries of the first location area (that is, the height of the person) is 200 px, and the upper and lower boundaries of the second location area (that is, the height of the person) determined based on step S605 is 400 px.
  • the left and right boundaries of the first location area are 80 px
  • the left and right boundaries of the second location area that is, the width of the person determined based on step S605 are 200 px.
  • the error between the first location area and the second location area is relatively large, so it can be considered that the category to which the target object detected in step S603 belongs is not credible.
  • the error between the first location area and the second location area is less than the preset threshold, that is, the error between the first location area and the second location area is small, it can be considered that the target object detected in step S603
  • the category it belongs to is credible.
  • S608 Based on the error between the first location area and the second location area, correct the first confidence level to obtain the second confidence level of the category to which the target belongs.
  • the second confidence level here is used to indicate whether the category mentioned by the target object is credible. When the above error is large, the second confidence level can be set to a lower value; when the above error is small, the second confidence level can be set to a higher value.
  • the category to which the target object detected in step S603 belongs can be considered credible; when the second confidence level is less than or equal to the preset threshold (for example, 0.7), it can be considered The category to which the target object detected in step S603 belongs is not credible.
  • step S606 to step S608 in the target detection method described in FIG. 6 is described in detail.
  • Fig. 7 is an image C obtained by the photographing device. Assuming that the steps from step S601 to step S605 are used, it has been detected that there are pedestrians in the candidate image area d in the image C, and the probability is 0.6. It can be seen from FIG. 7 that the object actually presented in the candidate area d is a tree. Since the candidate area d is far from the camera, it is a small target on the road, and therefore is likely to cause misjudgment. The candidate area d here is also the aforementioned first location area. Next, the target detection device may determine the length of the object presented in the candidate area d along the direction U.
  • the object detection device can determine the distance between the pedestrian and the camera when the object presented in the candidate area d is a pedestrian. Then, infer the second location area where the pedestrian appears in the image.
  • Figs. 8a-8b For the determination method of the second location area, refer to Figs. 8a-8b.
  • the focal length of the camera device is fx, fy
  • the world coordinate system the world coordinate system, camera coordinate system, and image coordinate system are shown in Figure 8a.
  • the Y axis is along the direction of gravity
  • the Z axis is along the direction of the car
  • the X axis is Along the direction perpendicular to the Y-axis and Z-axis.
  • the Y axis in the world coordinate system is mapped to the image coordinate system as the V axis
  • the X axis in the world coordinate system is mapped to the image coordinate system as the U axis.
  • the X axis in the world coordinate system is mapped to the camera coordinate system as the x axis
  • the Y axis in the world coordinate system is mapped to the camera coordinate system as the y axis
  • the Z axis in the world coordinate system is mapped to the camera coordinate system. Is the z axis.
  • the Z axis in the world coordinate system is not considered in the mapping process.
  • the distance between the camera device and the ground is h com .
  • the distance between the target object and the camera can be deduced by the above formula (1) or formula (2).
  • the target object is at the position F shown in Fig. 8a. What needs to be explained here is that the target object at position F is hypothetical, not necessarily real. Its function is to verify the correctness of the object reasoning presented in the candidate area d in step S603. Then, by querying the preset table, it can be determined that the height of the pedestrian at the position F is h w .
  • the target detection device can compare the coordinate difference between the first location area shown in FIG. 8a and the second location area shown in FIG. 8b. It can be seen from Figures 8a-8b that the difference between the first location area and the second location area is relatively large. Therefore, the target detection device can determine the second confidence that the target object presented in the first location area detected by the target detection model is a pedestrian based on the determined difference between the first location area and the second location area.
  • the second confidence level may be 0.1.
  • the second location area is determined and then based on The error between the first location area and the second location area detects a misjudged target object (for example, a tree is misjudged as a pedestrian), so that the accuracy of small target detection on the road can be further improved.
  • a misjudged target object for example, a tree is misjudged as a pedestrian
  • the step of optimizing the target detection model may also be included.
  • a training sample set is randomly selected, and the training sample set includes a plurality of training sample images.
  • the training sample image is input to the target detection model, and the category and the first location area of the object in the training sample image are obtained.
  • the method for determining the second location area shown in step S604 is used to determine the second location area in each sample image.
  • the second preset loss function and the backpropagation algorithm are used to iteratively adjust the level of each layer of the target detection model. Weight to optimize the target detection model.
  • the second preset loss function is used to indicate the difference between the first location area and the second location area.
  • the detection accuracy of the target detection model can be further improved, that is, the accuracy of road target detection can be improved, so as to provide guarantee for subsequent automatic driving vehicles to detect and avoid obstacles.
  • FIG. 9 shows the target detection device 900 provided by the embodiment of the present application.
  • the target detection device 900 includes: an acquisition module 901, which is used to acquire an image using a camera; a calibration module 902, which is used to calibrate the image in the image based on the parameters of the camera and the preset driving path Region of interest; a first detection module 903, used to detect the image using a target detection algorithm to obtain the category of the target object in the image, the first location area of the target object in the image, and The confidence level of the category to which the target object belongs; a first correction module 904, configured to correct the confidence level of the category to which the target object belongs based on the relative position relationship between the first location area and the region of interest , Get the first degree of confidence.
  • an acquisition module 901 which is used to acquire an image using a camera
  • a calibration module 902 which is used to calibrate the image in the image based on the parameters of the camera and the preset driving path Region of interest
  • a first detection module 903 used to detect the image using a target detection algorithm to obtain the category of the target object in the image, the first location area
  • the target detection device 900 further includes: a determination module 905, configured to respond to the first confidence level being greater than a preset threshold, based on the parameters of the camera device, the boundary coordinates of the first location area in the image, and The size of the object corresponding to the set category in the real world is determined to determine the second location area in the image; the second detection module 906 is used to detect between the first location area and the second location area The second correction module 907 is configured to correct the first confidence level based on the error to obtain the second confidence level of the category to which the target belongs.
  • a determination module 905 configured to respond to the first confidence level being greater than a preset threshold, based on the parameters of the camera device, the boundary coordinates of the first location area in the image, and The size of the object corresponding to the set category in the real world is determined to determine the second location area in the image
  • the second detection module 906 is used to detect between the first location area and the second location area
  • the second correction module 907 is configured to correct the first confidence level based on the
  • the parameters of the camera device include at least one of the following: the focal length of the camera device, the distance between the camera device and a reference surface, a conversion matrix converted from the camera device coordinate system to an image coordinate system, and a photosensitive element The size of the photosensitive unit in.
  • the determining module includes: a first determining sub-module for conversion from the camera coordinate system to the image coordinate system based on the focal length of the camera device and the distance between the camera device and the reference surface
  • the matrix, the size of the photosensitive unit in the photosensitive element and the boundary coordinates of the first location area in the image are used to determine the distance between the imaging device and the target object
  • a second determining sub-module is used to determine the distance between the imaging device and the target The distance between the camera device and the target object, the size of the object corresponding to the detected category in the real world, the distance between the camera device and the reference surface, and the boundary coordinates of the first position area, determine the image In the second location area.
  • the category to which the target object belongs is selected from the preset candidate categories by matching the characteristics of the target object with the characteristics of the objects corresponding to a plurality of preset candidate categories, and based on the matching result.
  • the first detection module includes: a setting sub-module for setting calibration parameters in a pre-trained target detection model, and the calibration parameters are used to instruct the target detection model to calibrate multiple candidate regions in the image
  • the detection sub-module is used to input the image into the target detection model to obtain the output result of the target detection model, and the output result is used to indicate whether there are preset candidate categories in each of the candidate regions
  • a plurality of candidate regions in the image are predetermined based on constraint conditions; the constraint conditions include: the range of the region where the object corresponding to each of the preset candidate categories appears in the image, and each of the The imaging size range of the object corresponding to the preset candidate category in the image.
  • the setting submodule is specifically configured to: mark an initial candidate area in the image; screen the initial candidate area using the constraint condition, and obtain the multiple candidate areas based on the screening result.
  • the target detection device further includes a model optimization module, the model optimization module is specifically configured to: obtain a training sample set, the training sample set includes a plurality of sample images, each of the sample images presents a target object;
  • the sample image is input to the target detection model to obtain the category of the target object in each sample image and the first location area of the target object in the sample image, based on the category and first location area of the target object in the sample image
  • the boundary coordinates of and the parameters of the shooting device used to shoot the sample images are used to determine the second location area in each sample image;
  • the preset loss function is used to determine the deviation between the first location area and the second location area in each training sample, Based on the deviation, iteratively adjust the target detection model to obtain an optimized target detection model.
  • the target detection device 900 may include a processor, a memory, and a communication module.
  • the processor may control and manage the actions of the target detection device 900, for example, it may be used to support the target detection device 900 to execute the steps executed by each of the foregoing modules.
  • the memory can be used to support the target detection device 900 to execute and store program codes and data.
  • the communication module can be used for communication between the target detection apparatus 900 and other devices.
  • the processor may implement or execute various exemplary logic modules described in conjunction with the disclosure of this application.
  • the processor can also be a combination of computing functions, for example, a combination of one or more microprocessors, such as a central processing unit (CPU), and other general-purpose processors and digital signal processors (Digital Signal Processors). Processor, DSP), application specific integrated circuit (ASIC), ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, or discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor, a microcontroller, or any conventional processor.
  • the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDR SDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM, ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • Synchronous Link Dynamic Random Access Memory Synchronous Link Dynamic Random Access Memory
  • DR RAM Direct Rambus RAM
  • the communication module may specifically be a radio frequency circuit, a Bluetooth chip, a Wi-Fi chip, and other devices that interact with other electronic devices.
  • This embodiment also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, when the computer instructions run on the computer, the computer is caused to execute the above-mentioned related method steps to realize the temperature measurement in the above-mentioned embodiment method.
  • This embodiment also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps, so as to realize the temperature measurement method in the above-mentioned embodiment.
  • the embodiments of the present application also provide a device.
  • the device may specifically be a chip, component or module.
  • the device may include a coupled processor and a memory; wherein the memory is used to store computer execution instructions, and when the device is running, The processor can execute the computer-executable instructions stored in the memory, so that the chip executes the above-mentioned temperature measurement method.
  • the processor, computer readable storage medium, computer program product, or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the above provided The beneficial effects of the corresponding method will not be repeated here.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules is only a logical function division, and there may be other divisions in actual implementation, for example, multiple modules or components can be combined or integrated.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, and the indirect coupling or communication connection of the devices may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of a software product, and the software product is stored in a storage medium. It includes several instructions to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods of the various embodiments of the present application.
  • the aforementioned readable storage media include: U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks, etc., which can store program codes. medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

本申请实施例提供了一种目标检测方法和装置,该目标检测方法包括:利用摄像装置获取图像;基于摄像装置的参数和预设行驶路径,标定出图像中的感兴趣区域;利用目标检测算法对图像进行检测,得到图像中的目标对象所属的类别、所述目标对象在图像中的第一位置区域和目标对象所属的类别的置信度;基于第一位置区域与感兴趣区域之间的相对位置关系,修正目标对象所属的类别的置信度,得到第一置信度,从而使得所检测出的目标对象更加准确。

Description

目标检测方法和装置
本申请要求于2020年5月14日提交中国专利局、申请号为202010408685.9、申请名称为“目标检测方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,尤其涉及一种目标检测方法和装置。
背景技术
随着科学技术的发展,人工智能(AI,Artificial Intelligence)技术得到突飞猛进的提升。在一些人工智能技术中,通常采用机器学习的方法,构建各种结构的初始模型,例如神经网络模型、支持向量机模型、决策树模型等。然后,通过对各种初始模型进行训练,以实现诸如图像识别、自然语言处理等目的。其中,图像识别还包括对图像中呈现的文字识别和对图像中呈现的各个对象进行目标检测。
相关目标检测技术中,在诸如需要进行小目标检测的场景中,例如自动驾驶场景,对道路上较远处或者较小的目标检测时,通常无法进行有效的识别。例如,将远处指示牌上的行人标识识别成道路上的行人。
由此,如何对图像中的小目标进行有效的识别成为需要解决的问题。
发明内容
通过采用本申请所示的目标检测方法和装置,可以降低对图像中的小目标识别错误的概率,有利于提高目标检测的检测精度。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请实施例提供一种目标检测方法,该目标检测方法应用于电子设备,该目标检测方法包括:利用摄像装置获取图像;基于所述摄像装置的参数和预设行驶路径,标定出所述图像中的感兴趣区域;利用目标检测算法对所述图像进行检测,得到所述图像中的目标对象所属的类别、所述目标对象在所述图像中的第一位置区域和所述目标对象所属的类别的置信度;基于所述第一位置区域与所述感兴趣区域之间的相对位置关系,修正所述目标对象所属的类别的置信度,得到第一置信度。
通过利用所检测出的目标对象所在的位置区域和感兴趣区域之间的位置关系确定所属的类别的第一置信度,可以进一步验证所检测出的目标对象所属的类别是否正确,过滤出一些不符合逻辑或者不符合常理的类别检测结果,从而提高目标检测的准确性。
基于第一方面,当第一置信度小于预设阈值时,可以直接输出该第一置信度;当第一置信度小于预设阈值时,目标检测方法还包括:基于所述摄像装置的参数、所述第一位置区域在图像中的边界坐标、预先设置的所述类别对应的对象在现实世界中的大小,确定所述图像中的第二位置区域;检测所述第一位置区域和所述第二位置区域之间的误差;基于所述误差,修正所述第一置信度,得到所述目标所属的类别的第二置信度。
通过利用第二位置区域和第一位置区域之间的误差修正第一置信度,可以进一步降低对道路上的目标对象误判的情况(例如将道路远处的树木误判为行人),从而可以进一步提高目标对象检测的准确性。
在一种可能的实现方式中,所述摄像装置的参数包括以下至少一项:所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵和感光元件中的感光单元的尺寸。
在一种可能的实现方式中,所述基于所述摄像装置的参数、所述第一位置区域在图像中的边界坐标、所述类别对应的对象在现实世界中的大小,确定所述图像中的第二位置区域,包括:基于所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵、感光元件中的感光单元的尺寸和所述第一位置区域在图像中的边界坐标,确定所述摄像装置与所述目标对象之间的距离;基于所述摄像装置与目标对象之间的距离、所检测出的类别对应的对象在现实世界中的大小、所述摄像装置与参考面的距离、以及所述第一位置区域的边界坐标,确定所述图像中的第二位置区域。
在一种可能的实现方式中,所述目标对象所属的类别是将所述目标对象的特征与多个预设候选类别对应的对象的特征进行匹配,基于匹配结果,从所述预设候选类别中选择出的。
基于第一方面,在一种可能的实现方式中,上述目标检测算法可以是预先训练的目标检测模型。其中,所述利用目标检测算法对所述图像进行检测,得到所述图像中的目标对象所属的类别、所述目标对象在所述图像中的第一位置区域和所述目标对象所属的类别的置信度,包括:在预先训练的目标检测模型中设置标定参数,所述标定参数用于指示所述目标检测模型在所述图像中标定出多个候选区域;将所述图像输入至所述目标检测模型,得到所述目标检测模型的输出结果,所述输出结果用于指示各所述候选区域中是否呈现有预设候选类别的对象和所述目标对象所属的类别的置信度,其中,所述目标检测模型是基于训练样本和用于进行候选区域标定的标定参数,对神经网络训练得到的。
基于第一方面,在一种可能的实现方式中,所述图像中的多个候选区域是基于约束条件预先确定的;所述约束条件包括:各所述预设候选类别对应的对象呈现在所述图像中的区域范围、以及各所述预设候选类别对应的对象在所述图像中的成像大小范围。
通过采用约束条件对标定的位置区域进行筛选,可以过滤掉一些不必要进行检测的位置区域,降低了图像中待检测的位置区域的数目,从而提高目标检测模型的检测速度和检测准确度。
在一种可能的实现方式中,所述确定所述图像中的多个候选区域,包括:在所述图像中标定出初始候选区域;利用所述约束条件对所述初始候选区域进行筛选,基于筛选结果,得到所述多个候选区域。
在一种可能的实现方式中,所述方法还包括对所述目标检测模型的优化步骤,所述优化步骤包括:获取训练样本集,所述训练样本集包括多个样本图像,各所述样本图像中呈现有目标对象;将样本图像输入至所述目标检测模型,得各样本图像中的目标对象所属的类别和目标对象在样本图像中的第一位置区域,基于样本图像中的目标对象所属的类别、第一置区域的边界坐标以及用于拍摄样本图像的拍摄设备的参数,确定各样本图像中的第二位置区;利用预设损失函数确定各训练样本中第一位置区域和第二位置区域之间的偏差, 基于所述偏差,迭代调整所述目标检测模型,得到优化后的目标检测模型。
通过对目标检测模型进行优化,可以进一步提高目标检测模型的检测准确度,也即提高道路目标检测的准确性,为后续自动驾驶车辆进行障碍物检测与躲避等提供保障。
第二方面,本申请实施例提供一种目标检测装置,该目标检测装置包括:获取模块,用于利用摄像装置获取图像;标定模块,用于基于所述摄像装置的参数和预设行驶路径,标定出所述图像中的感兴趣区域;第一检测模块,用于利用目标检测算法对所述图像进行检测,得到所述图像中的目标对象所属的类别、所述目标对象在所述图像中的第一位置区域和所述目标对象所属的类别的置信度;第一修正模块,用于基于所述第一位置区域与所述感兴趣区域之间的相对位置关系,修正所述目标对象所属的类别的置信度,得到第一置信度。
基于第二方面,在一种可能的实现方式中,所述目标检测装置还包括:确定模块,用于响应于所述第一置信度大于预设阈值,基于所述摄像装置的参数、所述第一位置区域在图像中的边界坐标、预先设置的所述类别对应的对象在现实世界中的大小,确定所述图像中的第二位置区域;第二检测模块,用于检测所述第一位置区域和所述第二位置区域之间的误差;第二修正模块,用于基于所述误差,修正所述第一置信度,得到所述目标所属的类别的第二置信度。
基于第二方面,在一种可能的实现方式中,所述摄像装置的参数包括以下至少一项:所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵和感光元件中的感光单元的尺寸。
基于第二方面,在一种可能的实现方式中,所述确定模块包括:第一确定子模块,用于基于所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵、感光元件中的感光单元的尺寸和所述第一位置区域在图像中的边界坐标,确定所述摄像装置与所述目标对象之间的距离;第二确定子模块,用于基于所述摄像装置与目标对象之间的距离、所检测出的类别对应的对象在现实世界中的大小、所述摄像装置与参考面的距离、以及所述第一位置区域的边界坐标,确定所述图像中的第二位置区域。
基于第二方面,在一种可能的实现方式中,所述目标对象所属的类别是将所述目标对象的特征与多个预设候选类别对应的对象的特征进行匹配,基于匹配结果,从所述预设候选类别中选择出的。
基于第二方面,在一种可能的实现方式中,所述第一检测模块包括:设置子模块,用于在预先训练的目标检测模型中设置标定参数,所述标定参数用于指示所述目标检测模型在所述图像中标定出多个候选区域;检测子模块,用于将所述图像输入至所述目标检测模型,得到所述目标检测模型的输出结果,所述输出结果用于指示各所述候选区域中是否呈现有预设候选类别的对象和所述目标对象所属的类别的置信度,其中,所述目标检测模型是基于训练样本和用于进行候选区域标定的标定参数,对神经网络训练得到的。
基于第二方面,在一种可能的实现方式中,所述图像中的多个候选区域是基于约束条件预先确定的;所述约束条件包括:各所述预设候选类别对应的对象呈现在所述图像中的区域范围、以及各所述预设候选类别对应的对象在所述图像中的成像大小范围。
基于第二方面,在一种可能的实现方式中,所述设置子模块具体用于:在所述图像中 标定出初始候选区域;利用所述约束条件对所述初始候选区域进行筛选,基于筛选结果,得到所述多个候选区域。
基于第二方面,在一种可能的实现方式中,所述目标检测装置还包括模型优化模块,所述模型优化模块具体用于:获取训练样本集,所述训练样本集包括多个样本图像,各所述样本图像中呈现有目标对象;将样本图像输入至所述目标检测模型,得各样本图像中的目标对象所属的类别和目标对象在样本图像中的第一位置区域,基于样本图像中的目标对象所属的类别、第一置区域的边界坐标以及用于拍摄样本图像的拍摄设备的参数,确定各样本图像中的第二位置区;利用预设损失函数确定各训练样本中第一位置区域和第二位置区域之间的偏差,基于所述偏差,迭代调整所述目标检测模型,得到优化后的目标检测模型。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时,使得电子设备实现如第一方面所述的方法。
第四方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质存储有指令,当指令在计算机上运行时,用于执行上述第一方面所述的方法。
第五方面,本申请实施例提供一种计算机程序或计算机程序产品,当计算机程序或计算机程序产品在计算机上被执行时,使得计算机执行如第一方面所述的方法。
应当理解的是,本申请的第二至五方面与本申请的第一方面的技术方案一致,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
图1是本申请实施例提供的应用于本申请实施例的一种应用场景的硬件结构示意图;
图2是本申请实施例提供的现有技术中对于图象呈现的对象误判的示意图;
图3是本申请实施例提供的一个目标检测方法的示意性流程图;
图4是本申请实施例提供的目标检测模型训练过程中提供的负样本的一个示意图;
图5a-图5e是本申请实施例提供的目标检测方法的一个应用场景示意图;
图6是本申请实施例提供的又一个目标检测方法的示意性流程图;
图7是本申请实施例提供的目标检测方法的又一个应用场景示意图;
图8a-图8b是本申请实施例提供的在图7所示的应用场景下第二位置区域的确定方法的示意图;
图9是本申请实施例提供的目标检测装置的一个示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。以下描述中,参考形成本申请一部分并以说明之方式示出本申请实施例的具体方面或可使用本申请实施例的具体方面的附图。应理解,本申请实施例可在其它方面中使用,并可包括附图中未描绘的结构或逻辑变化。因此,以下详细描述不应以限制性的意义来理解,且本申请的范围由所附权利要求书界定。例如,应理解,结合所描述方法的揭示内容可以同样适用于用于执行所述方法的对应设备或系统,且反之亦然。例如,如果描述一个或多个具体方法步骤,则对应 的设备可以包含如功能单元等一个或多个单元,来执行所描述的一个或多个方法步骤(例如,一个单元执行一个或多个步骤,或多个单元,其中每个都执行多个步骤中的一个或多个),即使附图中未明确描述或说明这种一个或多个单元。另一方面,例如,如果基于如功能单元等一个或多个单元描述具体装置,则对应的方法可以包含一个步骤来执行一个或多个单元的功能性(例如,一个步骤执行一个或多个单元的功能性,或多个步骤,其中每个执行多个单元中一个或多个单元的功能性),即使附图中未明确描述或说明这种一个或多个步骤。进一步,应理解的是,除非另外明确提出,本文中所描述的各示例性实施例和/或方面的特征可以相互组合。
本申请所述的目标检测方法,可以应用于图像识别领域、需要对图像中的小目标进行检测和识别的各种场景中。下面以自动驾驶场景中、对道路上的小目标检测为例,对本申请进行详细说明。
请参考图1,图1为本申请实施例中的车辆的结构示意图。
耦合到车辆100或包括在车辆100中的组件可以包括推进系统110、传感器系统120、控制系统130、外围设备140、电源101、计算装置107以及用户接口108。计算装置107包括处理器102、收发器103和存储器104。计算装置107以是车辆100的控制器或控制器的一部分。存储器104包括处理器102可以运行的指令106,并且还可以存储地图数据105。车辆100的组件可以被配置为以与彼此互连和/或与耦合到各系统的其它组件互连的方式工作。例如,电源101可以向车辆100的所有组件提供电力。计算装置107可以被配置为从推进系统110、传感器系统120、控制系统130和外围设备140接收数据并对它们进行控制。计算装置107可以被配置为在用户接口108上生成图像的显示并从用户接口108接收输入。
在一些可能的实施方式中,车辆100还可以包括更多、更少或不同的系统,并且每个系统可以包括更多、更少或不同的组件。此外,示出的系统和组件可以按任意种的方式进行组合或划分,本申请实施例对此不做具体限定。
下面,对上述各个系统进行说明。
上述推进系统102可以用于车辆100提供动力运动。仍参见图1所示,推进系统102可以包括引擎/发动机114、能量源113、传动装置(transmission)112和车轮/轮胎111。当然,推进系统102还可以额外地或可替换地包括除了图1所示出组件外的其它组件,本申请实施例对此不做具体限定。
传感器系统104可以包括用于感测关于车辆100所位于的环境的信息的若干个传感器。如图所示,传感器系统的传感器包括全球定位系统GPS126、惯性测量单元(inertial measurement unit,IMU)125、激光雷达传感器124、视觉传感器123、毫米波雷达传感器122以及用于为修改传感器的位置和/或朝向的致动器121中的至少一个。传感器系统120也可以包括额外的传感器,包括例如监视车辆100的内部系统的传感器(例如,O2监视器、燃油量表、机油温度,等中的至少一个)。传感器系统120也可以包括其它传感器。
全球定位系统(global positioning system,GPS)模块126可以为用于估计车辆100的地理位置的任何传感器。为此,GPS模块126可能包括收发器,基于卫星定位数据,估计车辆100相对于地球的位置。在示例中,计算装置107可以用于结合地图数据105使用GPS模块126来估计车辆100可以在其上行驶的道路上的车道边界的位置。GPS模块126 也可以采取其它形式。
IMU 125可以是用于基于惯性加速度及其任意组合来感测车辆100的位置和朝向变化。在一些示例中,传感器的组合可以包括例如加速度计和陀螺仪。传感器的其它组合也是可能的。
激光雷达传感器(light detection and ranging,LiDAR)124可以被看作物体检测系统,该传感器使用光感测或检测车辆100所位于的环境中的物体。通常,LIDAR 124是可以通过利用光照射目标来测量到目标的距离或目标的其它属性的光学遥感技术。作为示例,LIDAR 124可以包括被配置为发射激光脉冲的激光源和/或激光扫描仪,和用于为接收激光脉冲的反射的检测器。例如,LIDAR 124可以包括由转镜反射的激光测距仪,并且以一维或二维围绕数字化场景扫描激光,从而以指定角度间隔采集距离测量值。在示例中,LIDAR 124可以包括诸如光源(例如,激光)、扫描仪和光学系统、光检测器和接收器电子器件之类的组件,以及位置和导航系统。LIDAR 124通过扫描一个物体上反射回来的激光确定物体的距离,可以形成精度高达厘米级的三维(3 dimensions,3D)环境图。
视觉传感器(visual sensor)123可以用于获取车辆100所位于的环境的图像的任何摄像头(例如,静态摄像头、视频摄像头等)。为此,视觉传感器123可以被配置为检测可见光,或可以被配置为检测来自光谱的其它部分(如红外光或紫外光)的光。其它类型的视觉传感器也是可能的。视觉传感器123可以是二维检测器,或可具有三维空间范围的检测器。在一些可能的实施方式中,视觉传感器123例如可以是距离检测器,其被配置为生成指示从视觉传感器123到环境中的若干点的距离的二维图像。为此,视觉传感器123可使用一种或多种距离检测技术。例如,视觉传感器123可被配置为使用结构光技术,其中车辆100利用预定光图案,诸如栅格或棋盘格图案,对环境中的物体进行照射,并且使用视觉传感器123检测从物体的预定光图案的反射。基于反射的光图案中的畸变,车辆100可被配置为检测到物体上的点的距离。预定光图案可包括红外光或其它波长的光。
毫米波雷达传感器(millimeter-wave radar)122通常指波长为1~10mm的物体检测传感器,频率大致范围是10GHz~200GHz。毫米波雷达测量值具备深度信息,可以提供目标的距离;其次,由于毫米波雷达有明显的多普勒效应,对速度非常敏感,可以直接获得目标的速度,通过检测其多普勒频移可将目标的速度提取出来。目前主流的两种车载毫米波雷达应用频段分别为24GHz和77GHz,前者波长约为1.25cm,主要用于短距离感知,如车身周围环境、盲点、泊车辅助、变道辅助等;后者波长约为4mm,用于中长距离测量,如自动跟车、自适应巡航(adaptive cruise control,ACC)、紧急制动(autonomous emergency braking,AEB)等。
控制系统130可被配置为控制车辆100及其组件的操作。为此,控制系统130可包括转向单元136、油门135、制动单元134、传感器融合单元133、计算机视觉系统132、导航或路线控制(pathing)系统131。当然,控制系统130还可以额外地或可替换地包括除了图1所示出组件外的其它组件,本申请实施例对此不做具体限定。
外围设备140可被配置为允许车辆100与外部传感器、其它车辆和/或用户交互。为此,外围设备140可以包括例如无线通信系统144、触摸屏143、麦克风142和/或扬声器141。当然,外围设备140可以额外地或可替换地包括除了图1所示出组件外的其它组件,本申请实施例对此不做具体限定。
电源101可以被配置为向车辆100的一些或全部组件提供电力。为此,电源110可以包括例如可再充电锂离子或铅酸电池。在一些示例中,一个或多个电池组可被配置为提供电力。其它电源材料和配置也是可能的。在一些可能的实现方式中,电源110和能量源113可以一起实现。
包括在计算装置107中的处理器102可包括一个或多个通用处理器和/或一个或多个专用处理器(例如,图像处理器、数字信号处理器等)。就处理器102包括多于一个处理器而言,此时处理器可单独工作或组合工作。计算装置107可以实现基于通过用户接口108接收的输入控制车辆100的功能。
收发器103用于计算装置107与各个系统间的通信。
存储器104进一步可以包括一个或多个易失性存储组件和/或一个或多个非易失性存储组件,诸如光、磁和/或有机存储装置,并且存储器104可全部或部分与处理器102集成。存储器104可以包含可由处理器102运行的指令106(例如,程序逻辑),以运行各种车辆功能,包括本申请实施例中描述的功能或方法中的任何一个。
车辆100的组件可以被配置为以与在其各自的系统内部和/或外部的其它组件互连的方式工作。为此,车辆100的组件和系统可通过系统总线、网络和/或其它连接机制连接在一起。
在本申请实施例中,结合上述车辆100的结构,上述车辆在自动驾驶模式的过程中,通常采用目标检测算法对道路上的目标实时检测,以确保车辆行驶的安全性。例如,通过目标检测,可以告知车辆可行驶区域并标记出障碍物的位置,进而辅助车辆避障。
当采用目标检测算法进行目标检测时,首先,计算装置通过深度学习训练一个可以识别特定类别物体的神经网络,这里,特定类别物体可以为行人、车辆、树木、房屋、道路设施等常见目标物体。在进行目标检测时,计算装置通过该神经网络可以识别出上述特定类别物体。由于神经网络学习到的是上述各特定类别物体的特征,当图像中有些相似的特征出现时,通常无法进行有效的识别,容易产生误判。
例如图2所示,在图2中,指示牌上呈现有行人标识,该行人并非实际的行人对象。但是,该指示牌上呈现的行人标识所具有的特征通常与道路远处行人的特征较为相似,导致神经网络将指示牌上呈现的行人标识误判为道路小目标行人,例如图2所示的,神经网络判断图2所示的指示牌上的行人标识为行人的概率是0.85,降低了目标检测的准确性。
为了解决上述问题,本申请实施例提供一种目标检测方法,该方法可以应用于目标检测装置。该目标检测装置可以为上述实施例中所述的计算装置或者计算装置中的一部分。
请参考图3,图3为本申请实施例所示的目标检测方法的示意性流程图。参见图3所示,该方法包括:
S301,利用摄像装置获取图像。
这里的摄像装置即为上述传感器系统中的视觉传感器,用于采集车体前方道路的图像。该图像中可以包括行人、车辆、路面、隔离栏等物体,当然,还可以包括人行道、行道树、交通信号灯等,本申请实施例不做具体限定。
在实际应用中,摄像装置可以为单目摄像头,由单目摄像头在一个时刻拍摄一张待处理的图像。或者,摄像装置还可以包括多目摄像头,这些摄像头可以在物理上合设于一个摄像装置中,还可以在物理上分设于多个摄像装置中。通过多目摄像头在同一时刻拍摄多 张图像,并可以根据这些图像进行处理,得到一张待识别的图像。当然,摄像装置还可以为其他情况,本申请实施例不做具体限定。
具体实现中,摄像装置可以实时地采集图像,或者可以周期性地采集图像。该周期如3s、5s、10s等。摄像装置还可以通过其他方式采集图像,本申请实施例不做具体限定。摄像装置采集到图像后,可以将图像传递给上述目标检测装置,此时,目标检测装置可以获得该图像。这里需要说明的是,S301可以是在车辆启动后,或者车辆启动自动驾驶功能之后执行。
步骤S302,基于摄像装置的参数和预设行驶路径,标定出图像中的感兴趣区域(region of interest,ROI)。
上述摄像装置所获取的图像通常为路况图像。该图像中通常呈现道路前方以及两侧的物体。例如,道路前方或侧方的车辆、行人、道路两侧的树木、位于车道内的轮胎、木箱等。由此,图像中的感兴趣区域可以为前方道路和两侧道路。在图像中,ROI可以是方框、圆、椭圆或者不规则多边形等方式勾勒出的区域。如图5b所示,其示意性的示出了图像中的ROI。
在具体实施过程中,ROI可以通过摄像装置的参数和预设行驶路径确定出来。摄像装置的参数例如包括但限于:摄像摄装置与参考平面之间的高度,摄像装置安装于车辆的位置;预设行驶路径例如包括但不限于:车辆所处的车道、车辆行驶规则(例如靠左行驶或靠右行驶)。上述目标检测装置在获取到图像后,可以基于摄像装置的参数和预设行驶路径,标定出图像中的感兴趣区域。
步骤S303,利用目标检测算法对所获取的图像进行检测,得到图像中的目标对象所属的类别、目标对象在图像中的位置区域和目标对象所属的类别的置信度。
步骤S303具体可以通过如下步骤实现:
第一步:在目标检测模型中设置标定参数,该标定参数用于指示目标检测模型在图像中标定出多个候选区域。
这里,标定参数为多个候选区域中的每个候选区域在图像中的位置参数(例如在图像中的位置坐标)。候选区域在图像中的位置为预先确定的。这里确定候选区域也可以称为设置先验框(priors anchor)。
实践中,可以采用诸如人工设置,K-means聚类,RPN(region proposal network)算法或者选择搜索(selective search)算法等现有的各种方法,来确定出多个候选区域。然后,将所确定出的各候选区域在图像中的位置参数设置于目标检测模型中。其中,确定候选区域的方法可以通过如下两种方式实现:
作为一种可能的实现方式,针对图像的尺寸,将图像等分为多个单元。然后,对于图像中每个等分的单元,设置长宽比不同的多个先验框,该多个先验框的长宽比的差异可以呈等差数列。所设置的多个先验证框也即多个候选区域。
作为另一种可能的实现方式,利用切分方法对图像进行切分,得到将上述图像切分后的多个图像块的候选区域图像集合。计算候选区域图像集合中每相邻两个区域的相似度(例如可以从诸如颜色相似度、纹理相似度等多个维度确定),最后对相似度高于预设阈值的区域进行合并,确定出最终的候选区域。
第二步:将图像输入至预先训练的、已进行标定参数设置的目标检测模型,以使目 标检测模型检测各个候选区域中呈现预设候选类别的目标对象的置信度以及候选区域的定位偏差。当检测出某一个或某几个候选区域呈现预设候选类别中的某一类或某几类对象的置信度较高,该置信度超过阈值时,则预测该一个或几个区域呈现有预设候选类别的对象。该候选区域的定位偏差是目标检测模型对候选区域进行边缘检测以及对多个候选区域的全连接层(或全卷积层)检测确定的。目标检测模型通过上述候选区域的定位偏差可以调整候选区域在图像中的位置。然后,目标检测模型输出呈现有目标对象的候选区域在图像中的位置信息、候选区域呈现的目标对象所属的类别以及目标对象所属的类别的置信度。预设候选类别可以为道路上的障碍物,包括但不限于行人、车辆、树木、轮胎、箱体、施工牌等。目标检测模型输出的呈现有目标对象的候选区域在图像中的位置信息,是基于候选区域的定位偏差、重新调整在图像中的位置后的位置信息。
在一种可能的实现方式中,第一步中所述的候选区域也可以是基于约束条件确定的。
具体实现中,可以利用上述selective search算法、人工设置、RPN或者K-means聚类等方法在图像中标定出初始候选区域;然后利用约束条件对初始候选区域进行筛选,从而得到最终的候选区域。
约束条件具体是指:预设候选类别中的各对象呈现在图像中的区域范围、预设候选类别中的各对象在图像中的成像大小范围。
这里的预设候选类别中的各对象呈现在图像中的区域范围,是基于各对象在现实世界中可能出现的位置区域确定的。实践中,可以基于摄像装置的焦距、感光元件中的感光单元的尺寸、摄像装置的光心参数、路面在世界坐标系中的位置坐标和摄像装置与参考面之间的高度,将各对象在现实世界中可能出现的位置区域映射至图像中,该图像中的映射区域即为对象呈现在图像中的区域范围。
这里的预设候选类别中的各对象在图像中的成像大小范围,是基于目标对象与摄像装置之间的距离、各对象在现实世界中高度和宽度决定的。在不同的距离下,同一类别的物体在同一拍摄装置所拍摄的图像中的成像大小均不相同。实践中,可以基于摄像装置的焦距、感光元件中的感光单元的尺寸、摄像装置的光心参数、路面在世界坐标系中的位置坐标、摄像装置与参考面之间的高度、和各对象在现实世界中的高度和宽度,确定出在不同距离下各对象呈现在图像中的大小,然后确定出各对象在图像中的成像大小范围。
作为示例,假设候选目标类别中仅包括行人,上述约束条件包括:行人可能出现的位置为路面区域,行人在图像中的成像大小范围为在a-b,a小于b。其中,行人距离摄像装置为50米时,在图像中的成像大小为a,行人距离摄像装置为300米时,在图像中的成像大小为b,50米-300米可以认为摄像装置所能拍摄的距离范围。行人在图像中的成像大小,是基于行人在现实世界中的高度、宽度、行人距离摄像装置的距离、摄像装置的焦距、感光元件中的感光单元的尺寸、摄像装置的光心参数、摄像装置与参考面之间的高度而确定的。基于摄像装置的焦距、摄像装置的光心参数、路面在世界坐标系中的位置坐标和摄像装置与参考面之间的高度,可以标定出图像中呈现的路面的区域范围,记为路面范围。然后,从初始候选区域中筛选出位于路面范围内、且大小在a-b之间的候选区域。从而将筛选出的区域作为最终的候选区域。
上述所述的参考面可以为水平面。
通过采用约束条件对标定的位置区域进行筛选,可以过滤掉一些不必要进行检测的位置区域,降低了图像中待检测的位置区域的数目,从而提高目标检测模型的检测速度和检测准确度。
第二步中所述的目标检测模型,是基于训练样本集和预设损失函数,对神经网络训练得到的。
具体的,训练样本集包括样本图像和对样本图像的标注信息。这里的样本图像中的目标对象为道路小目标对象,也即是说,目标对象在图像中所占的位置较小(例如小于某一预设面积阈值)。
样本图像包括正样本图像和负样本图像,负样本图像包括:所呈现的对象具有其中一种预设候选类别的形状或轮廓、但属于其他类别的图像,该其他类别可能属于预设候选类别的另外一种,也可能不属于预设候选类别的任意一种。例如,局部位置具有行人轮廓的树木的图像、具有小狗轮廓的垃圾桶的图像、或者图4所示的显示有行人的指示牌的图像。标注信息用于指示正样本图像中的对象所属的类别和该对象在样本图像中的位置。标注信息还用于指示负样本图像呈现的对象所属的类别或者不属于任意一种预设候选类别,以及图像呈现的对象在样本图像中的位置。作为示例,假设预设候选类别包括二类:行人和树木。其中,行人用1表示、树木用2表示,不属于任意一类用0表示。对如图4所示的负样本的标注信息为0,位置区域(a,b,c,d)。a,b,c,d分别为图4所示的矩形框的四个顶点在图4中的图像坐标。
基于样本图像的尺寸和待检测目标的尺寸,结合上述约束条件和候选区域确定方法,得到候选区域,然后将候选区域在图像中的位置参数设置于神经网络中。
将样本图像输入至已经进行参数设置的神经网络,得到随机输出信息,该随机输出信息包括样本图像中呈现的对象对应的类别、在图像中的位置区域以及呈现的对象对应的类别的置信度。
然后,利用预设损失函数计算输出信息与标注信息之间的偏差,基于该偏差,迭代调整神经网络的权重参数,从而得到上述目标检测模型。这里的预设损失函数可以包括但不限于:均方差函数等。上述标注信息可以包括标注的置信度,这里的计算输出信息与标注信息之间的偏差,可以是指呈现的对象对应的类别的置信度与标注的置信度之间的偏差。
需要说明的是,上述神经网络可以包括卷积层、池化层、隐藏层、全连接层等。各层的数目可以根据所要识别的类别的数目、每一张图像中呈现的目标类别的数目以及图像的像素数目确定。
步骤S304,基于所确定出的位置区域与感兴趣区域之间的相对位置关系,修正目标对象所属的类别的置信度,得到第一置信度。
根据步骤S303所确定出的感兴趣区域的边界坐标以及步骤S302中所确定出的位置区域的边界坐标,确定位置区域的下边界坐标是否在ROI范围内。当第一位置的下边界不位于ROI范围内时,可以将第一置信度设置较低的置信度值,例如0、0.1等。也即是说,目标对象属于该类别的概率非常低。当第一位置的下边界位于ROI范围内时,可以将第一置信度设置较高的置信度值,例如0.8、0.9、1等。
作为示例,当所检测出的目标对象为行人时,行人通常与地面有接触,无论是脚与地 面接触,还是利用诸如摩托或者自行车等交通工具在地面上行驶。这时,图像中的ROI为地面。然后,判断人像所在的位置区域的下边界是否位于地面范围内,从而判断图像中呈现的人与地面是否有接触。当检测出人像所在的位置区域的下边界位于地面范围内时,说明人与地面有接触,也即目标对象属于人的置信度较高,则将第一置信度设置较高的置信度值;当检测出人像所在的位置区域的下边界没有位于地面范围内时,说明人与地面没有接触,此时人相当于悬浮在半空中,此时目标对象属于人的置信度较低,则将第一置信度设置较低的置信度值。
从图3所示的实施例可以看出,通过利用所检测出的目标对象所在的位置区域和ROI之间的位置关系确定所属的类别的第一置信度,可以进一步验证所检测出的目标对象所属的类别是否正确,过滤出一些不符合逻辑或者不符合常理的类别检测结果,从而提高目标检测的准确性。
结合图5a-图5e所示的应用场景,对图3所述的目标检测方法的实现进行具体描述。
首先,通过摄像装置获取到如图5a所示的图像A。
接着,目标检测装置可以确定图像A中的感兴趣区域。图像A中的感兴趣区域可以为地面范围的区域,如图5b所示。此时,设置于自动驾驶车辆中的目标检测装置中可以记录图像A中感兴趣区域的边界坐标。
然后,利用上述第一步所述的候选区域标定方法,在图像中标定出多个候选区域图像,得到图像B,如图5c所示。从图5c中可以看出,在图像B上分布有大量矩形框,其中,每一个矩形框即为一个候选区域。需要说明的是,图5c所示的矩形框为示意性的,实际应用中,会包括更多或更少的矩形框,并且每一个矩形框的大小也可以根据应用场景的需要来确定。
再次,将图像B输入至预先训练的目标检测模型,从而确定出各候选区域中是否呈现有预设候选类别的目标对象,如图5d所示。图5d示出了目标检测模型输出的检测结果。从图5d中可以看出,目标检测模型检测出候选区域a呈现的目标对象为行人,且概率为0.85;候选区域b呈现的目标对象为树木,且概率为0.9,候选区域c呈现的目标对象为行人,且概率为0.7。该候选区域a、候选区域b和候选区域c即为目标检测模型检测出的位置区域。
最后,目标检测装置可以将各个候选区域图像的图像坐标与感兴趣区域的边界坐标进行比较,确定各个候选区域图像的图像坐标是否在感兴趣区域范围内。如图5e所示,候选区域c并没有位于图像的感兴趣区域范围内。对于路上的行人来说,其脚应该与地面接触,而候选区域c中的行人的脚未与底面接触到。由此,目标检测装置可以判断候选区域图像c呈现的对象是行人的置信度为0.1。
请继续参考图6,其示出了本申请提供的目标检测方法又一个实施例的流程图,该目标检测方法包括:
S601,利用摄像装置获取图像。
S602,基于摄像装置的参数和预设行驶路径,标定出图像中的感兴趣区域。
S603,利用目标检测算法对所获取的图像进行检测,得到图像中的目标对象所属的类别、目标对象在图像中的位置区域和目标对象所属的类别的置信度。
S604,基于所确定出的位置区域与感兴趣区域之间的相对位置关系,修正目标对象所 属的类别的置信度,得到第一置信度。
其中,步骤S601-步骤S604的具体实现以及所带来的有益效果参考图3所示的实施例中的步骤S301-步骤S304的相关描述,在此不再赘述。
S605,检测步骤S604所确定出的目标对象所述的类别的第一置信度是否大于预设阈值。大于预设阈值时,执行步骤S606-步骤S608。小于等于预设阈值时,输出第一置信度的值。
S606,基于摄像装置的参数、第一位置区域在图像中的边界坐标、预先设置的类别对应的对象在现实世界中的大小,确定图像中的第二位置区域。
这里的摄像装置的参数具体包括但不限于:摄像装置的焦距、感光元件中的感光单元的尺寸、摄像装置的转换矩阵、摄像装置与参考面的距离和摄像装置的光心参数。
这里的参考面可以为地面。
步骤S606具体可以通过如下步骤实现。
第一步:基于摄像装置的焦距、摄像装置与参考面的距离、由摄像装置坐标系转换为图像坐标系的转换矩阵、感光元件中的感光单元的尺寸和第一位置区域在图像中的边界坐标,确定摄像装置与目标对象之间的距离。
具体来说,假设步骤S604中所检测出的目标对象所属的类别是正确的,查询该类别对应的物体在现实世界中的大小。例如,路上行人的身高通常在130cm-190cm之间,横向宽度通常在43cm-55cm之间。通过公式(1),也即世界坐标系到图像平面坐标系之间的转换公式,可以确定出摄像装置与目标物体之间的距离。
Z wp=KR[I|-T]P w      (1)
其中,P w=[X w Y w Z w 1] T为目标在世界坐标系中的坐标,p=[u v 1] T为目标在图像中的成像坐标,K为摄像装置的转换矩阵,R为从世界坐标系到摄像装置坐标系的旋转矩阵,T为从世界坐标系到摄像装置坐标系的平移矩阵,其中摄像装置的高度设置于该矩阵中,I为单位对角阵,Z w代表摄像装置与目标物体之间的距离。
此外,公式(2)为应用于本申请实施例中的、对公式(1)细化和推导后得到的确定摄像装置与目标物体之间的距离的公式。其中,通过公式(2),也即根据目标在真实世界中的大小和目标在成像平面中的大小,确定摄像装置与目标物体之间的距离Zw。
Z w=h wf y/(hd y)
或者Z w=w wf x/(wd x)       (2)
其中,f x,f y分别是摄像装置在x、y轴方向的焦距,d x,d y分别是感光元件上x、y轴方向感光单元的尺寸,w w,h w分别为目标在真实世界中的宽度、高度,w,h为成像的宽度与高度,Z w是目标对象与摄像装置之间的距离。
第二步:基于摄像装置与目标对象之间的距离、所检测出的类别对应的对象在现实世界中的大小、摄像装置与参考面的距离、以及第一位置区域的边界坐标,基于道路小目标在道路平面的假设约束,确定图像中的第二位置区域。
具体的,该第一位置区域的边界坐标可以是第一区域的下边界坐标。其可以包括多个坐标点,也可以包括一个坐标点。其中,当第一位置区域的下边界坐标包括一个坐标点时,该坐标点可以为下边界的中点,也可以为下边界与其他边界(例如左边界或右边界)交汇的顶点。
这里,通过所确定出的摄像装置与目标对象之间的距离、所检测出的类别对应的对象在现实世界中的大小、摄像装置与参考面的距离、以及第一位置区域的边界坐标,基于道路小目标处于地面的前提假设,采用公式(3)可以反推出步骤S603中所确定出的类别的对象呈现在图像中的位置,该位置也即是第二位置区域。其中,公式(3)中可以确定出该类别的对象呈现在图像中的高度,该高度是沿重力方向的高度;然后,基于现实世界中该类别的对象的宽度-高度比例,可以确定出该类别的对象呈现在图像中的宽度。该第二位置区域即是以第一位置区域的下边界作为底边(例如以第一位置区域的下边界的中点作为底边的中点,或者以第一位置区域的下边界的其中一个顶点作为底边的第一个起始点),以公式(3)确定出的宽度作为第二位置区域左边界和右边界的宽度,以所确定出的高度作为第二位置区域下边界和上边界的宽度,从而可以确定出第二位置区域在图像中的具体坐标范围。
Figure PCTCN2021081090-appb-000001
这里,(u、v)为图像中的第一位置区域中的某一固定点(例如下边界的中点、下边界的顶点)在图像坐标系中的坐标,(X w Y w Z w)为现实中目标对象的某一点(如行人脚部与地面接触点、汽车车轮与地面接触点)在世界坐标系中的坐标;f x,f y分别是在摄像装置坐标系下、摄像装置x轴、y轴方向的焦距,d x,d y分别是在摄像装置坐标系下、感光元件上x轴、y轴方向感光单元的尺寸。u 0,v 0是图像平面中心(图像中心像素点坐标),h w为目标在真实世界中的高度,h com是摄像平面与参考面之间的距离,
Figure PCTCN2021081090-appb-000002
为根据参考距离和目标高度推导的成像的高度。需要说明的是,采用公式(3)确定第二位置区域时,通常图像中的第一位置区域中的某一固定点(u、v)与现实中目标对象的某一点(X w Y w Z w)具有映射关系。也即是说,图像中的点(u、v)与现实中的点(X w Y w Z w)例如均用于指示目标对象的脚部,或者对于车辆来说同一轮胎的位置。
S607,检测第一位置区域和第二位置区域之间的误差。
这里的检测第一位置区域和第二位置区域之间的误差,可以检测第一位置区域的高度和第二位置区域的高度之间的误差,或者第一位置区域的宽度和第二位置区域的宽度之间的误差,或者检测第一位置区域的宽度与高度的比值和第二位置区域的宽度与高度之间的比值的误差。
通过确定第一位置区域和第二位置区域之间的误差,可以反推出步骤S603中所检测出的目标对象所属的类别是否准确。当第一位置区域和第二位置区域之间的误差大于预设阈值时,可以认为步骤S603所检测出的目标对象所属的类别不可信;例如,当步骤S603判断出第一位置区域的目标对象为人,而第一位置区域的上下边界(也即人的高度)为200px,基于步骤S605确定出的第二位置区域的上下边界(也即人的高度)为400px。或者,第一位置区域的左右边界(也即人的宽度)为80px,基于步骤S605确定出的第二位置区域的左右边界(也即人的宽度)为200px。此时第一位置区域和第二位置区域之间的误差较大,由此可以认为步骤S603所检测出的目标对象所属的类别不可信。当第一位置区域和第二位置区域之间的误差小于预设阈值时,也即第一位置区域和第二位置区域之间的误差较小,此时可以认为步骤S603所检测出的目标对象所属的类别可信。
S608,基于第一位置区域和第二位置区域之间的误差,修正第一置信度,得到目标所 属的类别的第二置信度。
这里的第二置信度用于指示目标对象所述的类别是否可信。当上述误差较大时,可以将第二该置信度设置较低的值;当上述误差较小时,可以将第二置信度设置较高的值。
当第二置信度高于预设阈值(例如0.7)时,可以认为步骤S603所检测出的目标对象所属的类别可信;当第二置信度小于等于预设阈值(例如0.7)时,可以认为步骤S603所检测出的目标对象所属的类别不可信。
结合图7、图8a、图8b所示的应用场景,对图6所述的目标检测方法中步骤S606-步骤S608的实现进行具体描述。
图7为拍摄装置获取到的图像C。假设采用步骤S601-步骤S605的步骤,已经检测出图像C中候选图像区域d呈现有行人,且概率为0.6。从图7中可以看出,候选区域d实际呈现的对象为树木,由于候选区域d距离摄像装置较远,其为道路上的小目标,因此容易引起误判。这里的候选区域d也即为上述第一位置区域。接着,目标检测装置可以确定候选区域d中呈现的对象沿方向U的长度。由于通过步骤S603检测出候选区域d中呈现的对象为行人,此时目标检测装置可以确定出假设候选区域d中呈现的对象为行人时,行人与拍摄装置之间的距离。然后,反推出行人呈现在图像中的第二位置区域。
其中第二位置区域的确定方式参考图8a-图8b。假设摄像装置的焦距为fx、fy,世界坐标系、相机坐标系和图像坐标系如图8a所示,其中,世界坐标系中,Y轴沿重力方向,Z轴沿车行进的方向,X轴沿与Y轴和Z轴垂直的方向。世界坐标系中的Y轴映射至图像坐标系中为V轴,世界坐标系中的X轴映射至图像坐标系中为U轴。世界坐标系中的X轴映射至摄像装置坐标系中为x轴,世界坐标系中的Y轴映射至摄像装置坐标系中为y轴,世界坐标系中的Z轴映射至摄像装置坐标系中为z轴。在进行计算过程中,由于图像为二维坐标,在进行映射过程中不考虑世界坐标系中的Z轴。摄像装置与地面的距离为h com
假设候选区域c呈现的目标对象是行人为正确的,此时,通过上述公式(1)或者公式(2)可以反推出目标对象与摄像装置之间的距离。假设目标对象在图8a所示的位置F处。这里需要说明的是,位置F处的目标对象是假设的,不一定是真实存在的。其作用是用于验证步骤S603对候选区域d呈现的对象推理的正确性。然后,通过查询预先设置的表格,可以确定出在F位置处的行人的高度为h w。然后,通过选定图8b中所示的现实世界中,行人与地面接触的点(Xw,Yw,0)的坐标,将该坐标映射至图像C中为点(u,v),然后通过上述公式(3)确定出行人呈现在图像C中的高度为h’,其在图像中的第二位置区域为如图8b所示。
接着,目标检测装置可以对比图8a所示的第一位置区域和图8b所示的第二位置区域之间的坐标差异。从图8a-图8b中可以看出第一位置区域和第二位置区域之间的差值较大。从而,目标检测装置可以基于所确定出的第一位置区域和第二位置区域之间的差值,确定目标检测模型所检测出的第一位置区域呈现的目标对象为行人的第二置信度。例如,该第二置信度可以为0.1。
从图6所述的目标检测方法中可以看出,与图3所示的目标检测方法不同的是,本实施例在第一置信度大于预设阈值时,通过确定第二位置区域,然后基于第一位置区域和第二位置区域之间的误差检测出误判的目标对象(例如将树木误判成行人),从而可以进一 步提高道路上的小目标检测的准确性。
基于上述各实施例,在一些可能的实现方式中,还可以包括对目标检测模型进行优化的步骤。具体的,随机选取训练样本集,该训练样本集中包括多个训练样本图像。将训练样本图像输入至目标检测模型,得到训练样本图像中的对象所属的类别和第一位置区域。然后,采用步骤S604所示的第二位置区域的确定方法,确定各样本图像中的第二位置区域,最后,采用第二预设损失函数和反向传播算法,迭代调整目标检测模型各层的权重,以对目标检测模型进行优化。其中,第二预设损失函数用于指示第一位置区域和第二位置区域之间的差异。
通过对目标检测模型进行优化,可以进一步提高目标检测模型的检测准确度,也即提高道路目标检测的准确性,为后续自动驾驶车辆进行障碍物检测与躲避等提供保障。
请继续参考图9,其示出了本申请实施例提供的目标检测装置900。
如图9所示,目标检测装置900包括:获取模块901,用于利用摄像装置获取图像;标定模块902,用于基于所述摄像装置的参数和预设行驶路径,标定出所述图像中的感兴趣区域;第一检测模块903,用于利用目标检测算法对所述图像进行检测,得到所述图像中的目标对象所属的类别、所述目标对象在所述图像中的第一位置区域和所述目标对象所属的类别的置信度;第一修正模块904,用于基于所述第一位置区域与所述感兴趣区域之间的相对位置关系,修正所述目标对象所属的类别的置信度,得到第一置信度。
此外,目标检测装置900还包括:确定模块905,用于响应于所述第一置信度大于预设阈值,基于所述摄像装置的参数、所述第一位置区域在图像中的边界坐标、预先设置的所述类别对应的对象在现实世界中的大小,确定所述图像中的第二位置区域;第二检测模块906,用于检测所述第一位置区域和所述第二位置区域之间的误差;第二修正模块907,用于基于所述误差,修正所述第一置信度,得到所述目标所属的类别的第二置信度。
进一步的,所述摄像装置的参数包括以下至少一项:所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵和感光元件中的感光单元的尺寸。
进一步的,所述确定模块,包括:第一确定子模块,用于基于所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵、感光元件中的感光单元的尺寸和所述第一位置区域在图像中的边界坐标,确定所述摄像装置与所述目标对象之间的距离;第二确定子模块,用于基于所述摄像装置与目标对象之间的距离、所检测出的类别对应的对象在现实世界中的大小、所述摄像装置与参考面的距离、以及所述第一位置区域的边界坐标,确定所述图像中的第二位置区域。
进一步的,所述目标对象所属的类别是将所述目标对象的特征与多个预设候选类别对应的对象的特征进行匹配,基于匹配结果,从所述预设候选类别中选择出的。
进一步的,第一检测模块包括:设置子模块,用于在预先训练的目标检测模型中设置标定参数,所述标定参数用于指示所述目标检测模型在所述图像中标定出多个候选区域;检测子模块,用于将所述图像输入至所述目标检测模型,得到所述目标检测模型的输出结果,所述输出结果用于指示各所述候选区域中是否呈现有预设候选类别的对象和所述目标对象所属的类别的置信度,其中,所述目标检测模型是基于训练样本和用于进行候选区域标定的标定参数,对神经网络训练得到的。
进一步的,所述图像中的多个候选区域是基于约束条件预先确定的;所述约束条件包括:各所述预设候选类别对应的对象呈现在所述图像中的区域范围、以及各所述预设候选类别对应的对象在所述图像中的成像大小范围。
进一步的,所述设置子模块具体用于:在所述图像中标定出初始候选区域;利用所述约束条件对所述初始候选区域进行筛选,基于筛选结果,得到所述多个候选区域。
进一步的,所述目标检测装置还包括模型优化模块,所述模型优化模块具体用于:获取训练样本集,所述训练样本集包括多个样本图像,各所述样本图像中呈现有目标对象;将样本图像输入至所述目标检测模型,得各样本图像中的目标对象所属的类别和目标对象在样本图像中的第一位置区域,基于样本图像中的目标对象所属的类别、第一置区域的边界坐标以及用于拍摄样本图像的拍摄设备的参数,确定各样本图像中的第二位置区;利用预设损失函数确定各训练样本中第一位置区域和第二位置区域之间的偏差,基于所述偏差,迭代调整所述目标检测模型,得到优化后的目标检测模型。
需要说明的是,上述装置之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。
在采用集成的模块的情况下,目标检测装置900可以包括处理器、存储器和通信模块。其中,处理器可以对目标检测装置900的动作进行控制管理,例如,可以用于支持目标检测装置900执行上述各个模块执行的步骤。存储器可以用于支持目标检测装置900执行存储程序代码和数据等。通信模块,可以用于目标检测装置900与其他设备的通信。
其中,处理器可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑模块。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,例如包括中央处理单元(Central Processing Unit,CPU),还可以包括其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、或分立硬件组件等。通用处理器可以是微处理器、微控制器或者是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
通信模块具体可以为射频电路、蓝牙芯片、Wi-Fi芯片等与其他电子设备交互的设备。
本实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指 令,当该计算机指令在计算机上运行时,使得计算机执行上述相关方法步骤实现上述实施例中的温度测量方法。
本实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的温度测量方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括耦合的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述温度测量方法。
其中,本实施例提供的处理器、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的可读存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种目标检测方法,其特征在于,所述方法包括:
    利用摄像装置获取图像;
    基于所述摄像装置的参数和预设行驶路径,标定出所述图像中的感兴趣区域;
    利用目标检测算法对所述图像进行检测,得到所述图像中的目标对象所属的类别、所述目标对象在所述图像中的第一位置区域和所述目标对象所属的类别的置信度;
    基于所述第一位置区域与所述感兴趣区域之间的相对位置关系,修正所述目标对象所属的类别的置信度,得到第一置信度。
  2. 根据权利要求1所述的目标检测方法,其特征在于,所述方法还包括:
    响应于所述第一置信度大于预设阈值,基于所述摄像装置的参数、所述第一位置区域在图像中的边界坐标、预先设置的所述类别对应的对象在现实世界中的大小,确定所述图像中的第二位置区域;
    检测所述第一位置区域和所述第二位置区域之间的误差;
    基于所述误差,修正所述第一置信度,得到所述目标对象所属的类别的第二置信度。
  3. 根据权利要求2所述的目标检测方法,其特征在于,所述摄像装置的参数包括以下至少一项:所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵和感光元件中的感光单元的尺寸。
  4. 根据权利要求3所述的目标检测方法,其特征在于,所述基于所述摄像装置的参数、所述第一位置区域在图像中的边界坐标、预先设置的所述类别对应的对象在现实世界中的大小,确定所述图像中的第二位置区域,包括:
    基于所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵、感光元件中的感光单元的尺寸和所述第一位置区域在所述图像中的边界坐标,确定所述摄像装置与目标对象之间的距离;
    基于所述摄像装置与目标对象之间的距离、所检测出的类别对应的对象在现实世界中的大小、所述摄像装置与参考面的距离、以及所述第一位置区域的边界坐标,确定所述图像中的第二位置区域。
  5. 根据权利要求1至4任一项所述的目标检测方法,其特征在于,所述目标对象所属的类别是将所述目标对象的特征与多个预设候选类别对应的对象的特征进行匹配,基于匹配结果,从预设候选类别中选择出的。
  6. 根据权利要求5所述的目标检测方法,其特征在于,所述利用目标检测算法对所述图像进行检测,得到所述图像中的目标对象所属的类别、所述目标对象在所述图像中的第一位置区域和所述目标对象所属的类别的置信度,包括:
    在预先训练的目标检测模型中设置标定参数,所述标定参数用于指示所述目标检测模型在所述图像中标定出多个候选区域;
    将所述图像输入至所述目标检测模型,得到所述目标检测模型的输出结果,所述输出结果用于指示各所述候选区域中是否呈现有所述预设候选类别的对象和所述目标对象所属的类别的置信度,其中,所述目标检测模型是基于训练样本和用于进行候选区域标定的标定参数,对神经网络训练得到的。
  7. 根据权利要求6所述的目标检测方法,其特征在于,所述图像中的多个候选区域是基于约束条件预先确定的;
    所述约束条件包括:各所述预设候选类别对应的对象呈现在所述图像中的区域范围、以及各所述预设候选类别对应的对象在所述图像中的成像大小范围。
  8. 根据权利要求7所述的目标检测方法,其特征在于,所述确定所述图像中的多个候选区域,包括:
    在所述图像中标定出初始候选区域;
    利用所述约束条件对所述初始候选区域进行筛选,基于筛选结果,得到所述多个候选区域。
  9. 根据权利要求6至8任一项所述的目标检方法,其特征在于,所述方法还包括对所述目标检测模型的优化步骤,所述优化步骤包括:
    获取训练样本集,所述训练样本集包括多个样本图像,各所述样本图像中呈现有目标对象;
    将样本图像输入至所述目标检测模型,得到各样本图像中的目标对象所属的类别和目标对象在样本图像中的第一位置区域,基于样本图像中的目标对象所属的类别、第一置区域的边界坐标以及用于拍摄样本图像的拍摄设备的参数,确定各样本图像中的第二位置区;
    利用预设损失函数确定各训练样本中第一位置区域和第二位置区域之间的偏差,基于所述偏差,迭代调整所述目标检测模型,得到优化后的目标检测模型。
  10. 一种目标检测装置,其特征在于,包括:
    获取模块,用于利用摄像装置获取图像;
    标定模块,用于基于所述摄像装置的参数和预设行驶路径,标定出所述图像中的感兴趣区域;
    第一检测模块,用于利用目标检测算法对所述图像进行检测,得到所述图像中的目标对象所属的类别、所述目标对象在所述图像中的第一位置区域和所述目标对象所属的类别的置信度;
    第一修正模块,用于基于所述第一位置区域与所述感兴趣区域之间的相对位置关系,修正所述目标对象所属的类别的置信度,得到第一置信度。
  11. 根据权利要求10所述的目标检测装置,其特征在于,所述装置还包括:
    确定模块,用于响应于所述第一置信度大于预设阈值,基于所述摄像装置的参数、所述第一位置区域在图像中的边界坐标、预先设置的所述类别对应的对象在现实世界中的大小,确定所述图像中的第二位置区域;
    第二检测模块,用于检测所述第一位置区域和所述第二位置区域之间的误差;
    第二修正模块,用于基于所述误差,修正所述第一置信度,得到所述目标所属的类别的第二置信度。
  12. 根据权利要求11所述的目标检测装置,其特征在于,所述摄像装置的参数包括以下至少一项:所述摄像装置的焦距、所述摄像装置与参考面的距离、由所述摄像装置坐标系转换为图像坐标系的转换矩阵和感光元件中的感光单元的尺寸。
  13. 根据权利要求12所述的目标检测装置,其特征在于,所述确定模块,包括:
    第一确定子模块,用于基于所述摄像装置的焦距、所述摄像装置与参考面的距离、由 所述摄像装置坐标系转换为图像坐标系的转换矩阵、感光元件中的感光单元的尺寸和所述第一位置区域在图像中的边界坐标,确定所述摄像装置与所述目标对象之间的距离;
    第二确定子模块,用于基于所述摄像装置与目标对象之间的距离、所检测出的类别对应的对象在现实世界中的大小、所述摄像装置与参考面的距离、以及所述第一位置区域的边界坐标,确定所述图像中的第二位置区域。
  14. 根据权利要求10至13任一项所述的目标检测装置,其特征在于,所述目标对象所属的类别是将所述目标对象的特征与多个预设候选类别对应的对象的特征进行匹配,基于匹配结果,从所述预设候选类别中选择出的。
  15. 根据权利要求14所述的目标检测装置,其特征在于,所述第一检测模块,包括:
    设置子模块,用于在预先训练的目标检测模型中设置标定参数,所述标定参数用于指示所述目标检测模型在所述图像中标定出多个候选区域;
    检测子模块,用于将所述图像输入至所述目标检测模型,得到所述目标检测模型的输出结果,所述输出结果用于指示各所述候选区域中是否呈现有预设候选类别的对象和所述目标对象所属的类别的置信度,其中,所述目标检测模型是基于训练样本和用于进行候选区域标定的标定参数,对神经网络训练得到的。
  16. 根据权利要求15所述的目标检测装置,其特征在于,所述图像中的多个候选区域是基于约束条件预先确定的;
    所述约束条件包括:各所述预设候选类别对应的对象呈现在所述图像中的区域范围、以及各所述预设候选类别对应的对象在所述图像中的成像大小范围。
  17. 根据权利要求16所述的目标检测装置,其特征在于,所述设置子模块具体用于:
    在所述图像中标定出初始候选区域;
    利用所述约束条件对所述初始候选区域进行筛选,基于筛选结果,得到所述多个候选区域。
  18. 根据权利要求15至17任一项所述的目标检测装置,其特征在于,所述目标检测装置还包括模型优化模块,所述模型优化模块具体用于:
    获取训练样本集,所述训练样本集包括多个样本图像,各所述样本图像中呈现有目标对象;
    将样本图像输入至所述目标检测模型,得到各样本图像中的目标对象所属的类别和目标对象在样本图像中的第一位置区域,基于样本图像中的目标对象所属的类别、第一置区域的边界坐标以及用于拍摄样本图像的拍摄设备的参数,确定各样本图像中的第二位置区;
    利用预设损失函数确定各训练样本中第一位置区域和第二位置区域之间的偏差,基于所述偏差,迭代调整所述目标检测模型,得到优化后的目标检测模型。
  19. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时,使得电子设备实现如权利要求1至9任一项所述的方法。
  20. 一种可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如权利要求1至9中任一项所述的方法。
  21. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至9中任一项所述的方法。
PCT/CN2021/081090 2020-05-14 2021-03-16 目标检测方法和装置 WO2021227645A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21803599.6A EP4141737A4 (en) 2020-05-14 2021-03-16 TARGET DETECTION METHOD AND DEVICE
US17/985,479 US20230072730A1 (en) 2020-05-14 2022-11-11 Target detection method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010408685.9A CN113673282A (zh) 2020-05-14 2020-05-14 目标检测方法和装置
CN202010408685.9 2020-05-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/985,479 Continuation US20230072730A1 (en) 2020-05-14 2022-11-11 Target detection method and apparatus

Publications (1)

Publication Number Publication Date
WO2021227645A1 true WO2021227645A1 (zh) 2021-11-18

Family

ID=78526330

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/081090 WO2021227645A1 (zh) 2020-05-14 2021-03-16 目标检测方法和装置

Country Status (4)

Country Link
US (1) US20230072730A1 (zh)
EP (1) EP4141737A4 (zh)
CN (1) CN113673282A (zh)
WO (1) WO2021227645A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114816156A (zh) * 2022-05-12 2022-07-29 网易(杭州)网络有限公司 模型检测方法、装置、终端设备及存储介质
CN115019112A (zh) * 2022-08-09 2022-09-06 威海凯思信息科技有限公司 基于图像的目标对象检测方法、装置及电子设备
CN115359436A (zh) * 2022-08-18 2022-11-18 中国人民公安大学 基于遥感图像的排查方法、装置、设备及存储介质
CN115526809A (zh) * 2022-11-04 2022-12-27 山东捷瑞数字科技股份有限公司 一种图像处理方法、装置及电子设备和存储介质
CN116403284A (zh) * 2023-04-07 2023-07-07 北京奥康达体育产业股份有限公司 一种基于蓝牙传输技术的智慧跑步考核训练系统
CN116935290A (zh) * 2023-09-14 2023-10-24 南京邮电大学 机场场景下高分辨率阵列摄像机异构目标检测方法及系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627443B (zh) * 2022-03-14 2023-06-09 小米汽车科技有限公司 目标检测方法、装置、存储介质、电子设备及车辆
CN114863386A (zh) * 2022-03-30 2022-08-05 广州文远知行科技有限公司 交通信号灯的检测方法、装置及电子设备
CN116363435B (zh) * 2023-04-03 2023-10-27 盐城工学院 一种基于深度学习的遥感图像目标检测系统及方法
CN116309587A (zh) * 2023-05-22 2023-06-23 杭州百子尖科技股份有限公司 一种布料瑕疵检测方法、装置、电子设备及存储介质
CN116503491A (zh) * 2023-06-26 2023-07-28 安徽大学 一种基于相机标定和视觉的机器狗障碍物测距和避障方法
CN117011365B (zh) * 2023-10-07 2024-03-15 宁德时代新能源科技股份有限公司 尺寸测量方法、装置、计算机设备和存储介质
CN117472069B (zh) * 2023-12-28 2024-03-26 烟台宇控软件有限公司 一种用于输电线路检测的机器人控制方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100021010A1 (en) * 2008-07-25 2010-01-28 Gm Global Technology Operations, Inc. System and Method for detecting pedestrians
US20120300078A1 (en) * 2010-01-28 2012-11-29 Hitachi, Ltd Environment recognizing device for vehicle
US20150091715A1 (en) * 2013-09-27 2015-04-02 Fuji Jukogyo Kabushiki Kaisha Vehicle external environment recognition device
US20170053169A1 (en) * 2015-08-20 2017-02-23 Motionloft, Inc. Object detection and analysis via unmanned aerial vehicle
CN107862287A (zh) * 2017-11-08 2018-03-30 吉林大学 一种前方小区域物体识别及车辆预警方法
CN109583321A (zh) * 2018-11-09 2019-04-05 同济大学 一种基于深度学习的结构化道路中小物体的检测方法
CN110866420A (zh) * 2018-08-28 2020-03-06 天津理工大学 一种基于光场相机和hog、svm的2d欺骗性行人识别方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101284798B1 (ko) * 2009-12-08 2013-07-10 한국전자통신연구원 단일 카메라 영상 기반의 객체 거리 및 위치 추정 장치 및 방법
JP6540009B2 (ja) * 2013-12-27 2019-07-10 株式会社リコー 画像処理装置、画像処理方法、プログラム、画像処理システム

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100021010A1 (en) * 2008-07-25 2010-01-28 Gm Global Technology Operations, Inc. System and Method for detecting pedestrians
US20120300078A1 (en) * 2010-01-28 2012-11-29 Hitachi, Ltd Environment recognizing device for vehicle
US20150091715A1 (en) * 2013-09-27 2015-04-02 Fuji Jukogyo Kabushiki Kaisha Vehicle external environment recognition device
US20170053169A1 (en) * 2015-08-20 2017-02-23 Motionloft, Inc. Object detection and analysis via unmanned aerial vehicle
CN107862287A (zh) * 2017-11-08 2018-03-30 吉林大学 一种前方小区域物体识别及车辆预警方法
CN110866420A (zh) * 2018-08-28 2020-03-06 天津理工大学 一种基于光场相机和hog、svm的2d欺骗性行人识别方法
CN109583321A (zh) * 2018-11-09 2019-04-05 同济大学 一种基于深度学习的结构化道路中小物体的检测方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114816156A (zh) * 2022-05-12 2022-07-29 网易(杭州)网络有限公司 模型检测方法、装置、终端设备及存储介质
CN115019112A (zh) * 2022-08-09 2022-09-06 威海凯思信息科技有限公司 基于图像的目标对象检测方法、装置及电子设备
CN115359436A (zh) * 2022-08-18 2022-11-18 中国人民公安大学 基于遥感图像的排查方法、装置、设备及存储介质
CN115526809A (zh) * 2022-11-04 2022-12-27 山东捷瑞数字科技股份有限公司 一种图像处理方法、装置及电子设备和存储介质
CN116403284A (zh) * 2023-04-07 2023-07-07 北京奥康达体育产业股份有限公司 一种基于蓝牙传输技术的智慧跑步考核训练系统
CN116403284B (zh) * 2023-04-07 2023-09-12 北京奥康达体育产业股份有限公司 一种基于蓝牙传输技术的智慧跑步考核训练系统
CN116935290A (zh) * 2023-09-14 2023-10-24 南京邮电大学 机场场景下高分辨率阵列摄像机异构目标检测方法及系统
CN116935290B (zh) * 2023-09-14 2023-12-12 南京邮电大学 机场场景下高分辨率阵列摄像机异构目标检测方法及系统

Also Published As

Publication number Publication date
US20230072730A1 (en) 2023-03-09
EP4141737A4 (en) 2023-10-18
CN113673282A (zh) 2021-11-19
EP4141737A1 (en) 2023-03-01

Similar Documents

Publication Publication Date Title
WO2021227645A1 (zh) 目标检测方法和装置
US11670193B2 (en) Extrinsic parameter of on-board sensor
KR102565533B1 (ko) 자율 주행을 위한 항법 정보의 융합 프레임워크 및 배치 정렬
JP7073315B2 (ja) 乗物、乗物測位システム、及び乗物測位方法
US11458912B2 (en) Sensor validation using semantic segmentation information
US10825186B2 (en) Information processing device, information processing method, and computer program product
WO2020259284A1 (zh) 一种障碍物检测方法及装置
CN110945379A (zh) 从地图数据、激光和相机确定偏航误差
CN110795984A (zh) 信息处理方法、信息处理装置及程序记录介质
US10955857B2 (en) Stationary camera localization
KR20210061722A (ko) 고정밀 지도 제작 방법, 고정밀 지도 제작 장치, 컴퓨터 프로그램 및 컴퓨터 판독 가능한 기록 매체
US11200432B2 (en) Method and apparatus for determining driving information
JP6758160B2 (ja) 車両位置検出装置、車両位置検出方法及び車両位置検出用コンピュータプログラム
JP2017181476A (ja) 車両位置検出装置、車両位置検出方法及び車両位置検出用コンピュータプログラム
US10991155B2 (en) Landmark location reconstruction in autonomous machine applications
US20210080264A1 (en) Estimation device, estimation method, and computer program product
CN115718304A (zh) 目标对象检测方法、装置、车辆及存储介质
CN116385997A (zh) 一种车载障碍物精确感知方法、系统及存储介质
WO2022133986A1 (en) Accuracy estimation method and system
WO2021115273A1 (zh) 通信方法和装置
CN215495425U (zh) 复眼摄像系统及使用复眼摄像系统的车辆
TWM618998U (zh) 複眼攝像系統及使用複眼攝像系統的車輛
CN115393821A (zh) 一种目标检测方法和装置
CN115440067A (zh) 复眼摄像系统、使用复眼摄像系统的车辆及其影像处理方法
TW202248963A (zh) 複眼攝像系統,使用複眼攝像系統的車輛及其影像處理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21803599

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021803599

Country of ref document: EP

Effective date: 20221122

NENP Non-entry into the national phase

Ref country code: DE