WO2022148143A1 - 一种目标检测方法及装置 - Google Patents

一种目标检测方法及装置 Download PDF

Info

Publication number
WO2022148143A1
WO2022148143A1 PCT/CN2021/131569 CN2021131569W WO2022148143A1 WO 2022148143 A1 WO2022148143 A1 WO 2022148143A1 CN 2021131569 W CN2021131569 W CN 2021131569W WO 2022148143 A1 WO2022148143 A1 WO 2022148143A1
Authority
WO
WIPO (PCT)
Prior art keywords
interest
region
regions
coordinate system
dimensional model
Prior art date
Application number
PCT/CN2021/131569
Other languages
English (en)
French (fr)
Inventor
云一宵
郑迪威
马志贤
苏惠荞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022148143A1 publication Critical patent/WO2022148143A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30261Obstacle

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a target detection method and device.
  • intelligent terminals such as intelligent transportation equipment, smart home equipment, and robots are gradually entering people's daily life.
  • Sensors play a very important role in smart terminals.
  • Various sensors installed on the smart terminal such as millimeter-wave radar, lidar, camera, ultrasonic radar, etc., perceive the surrounding environment during the movement of the smart terminal, collect data, and identify, track, and identify moving objects. Speed measurement, distance measurement, and identification and positioning of objects in stationary scenes such as lane lines and traffic scenes, combined with navigator and map data for path planning and other behavior control.
  • the embodiments of the present application disclose a target detection method and device, which can detect obstacles in a traffic scene in real time, provide important information for subsequent path planning, and improve advanced driving assistance in automatic driving or assisted driving. system capability.
  • a first aspect of the embodiments of the present application discloses a target detection method, including: acquiring a first image; processing the first image to obtain Q regions of interest, and determining the corresponding reference points in each region of interest The coordinates of the image coordinate system; wherein, Q is a positive integer; according to the coordinates of the image coordinate system corresponding to the reference point in each region of interest, determine the vehicle body corresponding to the reference point in each region of interest The coordinates of the coordinate system; determine the three-dimensional model of the first target object; according to the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, and the vertex set of the three-dimensional model of the first target object, determine the at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system; carry out at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system in the image coordinate system
  • the Q pixel regions are obtained by projection; the detection result of the target is determined according to the Q interest regions and the Q
  • a three-dimensional model coordinate system is defined to obtain a three-dimensional model of the first target object.
  • the coordinates corresponding to the vehicle body coordinate system of the three-dimensional model of the first target object are obtained.
  • the embodiment of the present application can still detect the obstacle objects in the traffic scene.
  • the first target object is determined by converting the reference point of the region of interest from the image coordinate system to the vehicle body coordinate system, and then placing the three-dimensional model of the first target object on the vehicle body coordinate system corresponding to the reference point.
  • the coordinates of the three-dimensional model in the vehicle body coordinate system, and then the coordinates of the three-dimensional model in the vehicle body coordinate system are projected and matched in the image coordinate system, that is to say, the embodiment of the present application processes and analyzes the imaging features of the first target object. , to achieve the purpose of detection, without the need to obtain a large number of training samples collected and labeled in advance for training, and the computational complexity is low.
  • the determining the detection result of the target according to the Q regions of interest and the Q pixel regions includes: taking Q points of interest corresponding to the Q pixel regions The region is screened to obtain R regions of interest, each pixel region corresponds to a region of interest, R is a positive integer, and R is less than or equal to Q; determine the R bounding rectangles corresponding to the R regions of interest; according to The R circumscribed rectangles determine the detection result of the target.
  • obtaining R regions of interest by screening the Q regions of interest can effectively remove image noise and improve the accuracy of the target detection result.
  • the detection result of the target determined according to the R bounding rectangles can be determined by using The non-maximum value suppression algorithm processes the R circumscribed rectangles, removes redundant circumscribed rectangles, and finally determines the detection result of the target, which speeds up the efficiency of target detection.
  • the filtering of Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest includes at least one of the following manners: when the Q regions of interest are The ratio of the area of the first region of interest in the region to the convex envelope area of the first region of interest is greater than the first preset value, and the first region of interest is taken as the one of the R regions of interest.
  • the ratio of the convex envelope aspect ratio of the first region of interest in the Q regions of interest to the aspect ratio of the pixel region corresponding to the first region of interest is greater than the second preset value and If the condition is less than the third preset value, the first region of interest is regarded as one of the R regions of interest; when the convex envelope area of the first region of interest in the Q regions of interest is equal to The ratio of the areas of the pixel regions corresponding to the first region of interest satisfies the condition that is greater than the second preset value and less than the third preset value, and the first region of interest is used as the R senses.
  • the first region of interest is taken as one of the R regions of interest.
  • the filtering of Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest includes the following manner: using a pre-trained classifier to The content in the circumscribed rectangle of the first region of interest in the Q regions of interest is evaluated and scored to obtain a first score; when the first score is higher than the fifth preset value, the first sense of The region of interest is one of the R regions of interest.
  • the determining the detection result of the target according to the R circumscribed rectangles includes: calculating the area of the R circumscribed rectangles; determining according to the area of the R circumscribed rectangles The detection result of the target.
  • the detection result of the target can be quickly determined by the area of the R circumscribed rectangles, which speeds up the efficiency of target detection.
  • the determining the detection result of the target according to the R circumscribed rectangles includes: using a pre-trained classifier to evaluate and score the contents in the R circumscribed rectangles , obtain R scores; determine the detection result of the target according to the R scores.
  • a second aspect of the embodiments of the present application discloses a target detection device, comprising: an acquisition module, configured to acquire a first image; and a processing module, configured to process the first image to obtain Q regions of interest, and determine each region of interest. the coordinates of the image coordinate system corresponding to the reference point in the region of interest; wherein, Q is a positive integer; the processing module is used to determine the coordinates of the image coordinate system corresponding to the reference point in each region of interest the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest; the processing module is used to determine the three-dimensional model of the first target object; the processing module is used to determine the three-dimensional model of the first target object according to each interest The coordinates of the vehicle body coordinate system corresponding to the reference points in the area, the vertex set of the three-dimensional model of the first target object, and determining at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system; The processing module is used for projecting at least one coordinate corresponding to the
  • the processing module is further configured to perform screening processing on Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest, and each pixel region corresponds to a sensor Area of interest, R is a positive integer and R is less than or equal to Q; determine R circumscribed rectangles corresponding to the R areas of interest; determine the detection result of the target according to the R circumscribed rectangles.
  • the processing module is further configured to account for the ratio of the area of the first region of interest among the Q regions of interest to the convex envelope area of the first region of interest
  • the first region of interest is taken as one of the R regions of interest
  • the convex envelope width of the first region of interest in the Q regions of interest In the case where the ratio of the height ratio to the aspect ratio of the pixel region corresponding to the first region of interest satisfies the condition that it is greater than the second preset value and less than the third preset value, the first region of interest is used as the One of the R regions of interest
  • the ratio of the convex envelope area of the first region of interest in the Q regions of interest to the area of the pixel region corresponding to the first region of interest satisfies that it is greater than the
  • the second preset value is smaller than the third preset value
  • the first region of interest is taken as one of the R regions of interest
  • the processing module is further configured to use a pre-trained classifier to evaluate and score the content in the circumscribed rectangle of the first region of interest among the Q regions of interest, Obtain a first score; when the first score is higher than a fifth preset value, take the first region of interest as one of the R regions of interest.
  • the processing module is further configured to calculate the area of the R circumscribed rectangles; and determine the detection result of the target according to the area of the R circumscribed rectangles.
  • the processing module is further configured to use a pre-trained classifier to evaluate and score the contents in the R circumscribed rectangles to obtain R scores; The score determines the detection result of the target.
  • a third aspect of the embodiments of the present application discloses a target detection device, including: a processor and a memory, where the memory is used to store one or more programs, wherein the one or more programs include computer-executed instructions, and the processor uses In calling one or more programs stored in the memory, the following operations are performed: acquiring a first image; processing the first image to obtain Q regions of interest, and determining the corresponding reference points in each region of interest According to the coordinates of the image coordinate system corresponding to the reference point in each region of interest, determine the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest; determine the first A three-dimensional model of a target object; according to the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, and the vertex set of the three-dimensional model of the first target object, determine the three-dimensional model of the first target object at least one coordinate corresponding to the model in the vehicle body coordinate system; Q pixel regions are obtained by projecting at least one coordinate corresponding to the three-dimensional model
  • the at least one processor is further configured to perform screening processing on Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest, and each pixel region corresponds to A region of interest, R is a positive integer, and R is less than or equal to Q; determine R bounding rectangles corresponding to the R regions of interest; determine the detection result of the target according to the R bounding rectangles.
  • the at least one processor is further configured to account for the area of the first region of interest in the Q regions of interest to account for the convex envelope area of the first region of interest
  • the first region of interest is taken as one of the R regions of interest
  • the convex hull of the first region of interest in the Q regions of interest In the case where the ratio of the network aspect ratio to the aspect ratio of the pixel region corresponding to the first region of interest satisfies the condition that it is greater than the second preset value and less than the third preset value, the first region of interest is As one of the R regions of interest
  • the ratio of the convex envelope area of the first region of interest in the Q regions of interest to the area of the pixel region corresponding to the first region of interest satisfies greater than Under the condition that the second preset value is smaller than the third preset value, the first region of interest is taken as one of the R regions of interest; in the Q regions of interest In the case where
  • the at least one processor is further configured to use a pre-trained classifier to evaluate the content in the circumscribed rectangle of the first region of interest among the Q regions of interest Scoring to obtain a first score; when the first score is higher than a fifth preset value, the first region of interest is taken as one of the R regions of interest.
  • the at least one processor is further configured to calculate the areas of the R circumscribed rectangles; and determine the detection result of the target according to the areas of the R circumscribed rectangles.
  • the at least one processor is further configured to use a pre-trained classifier to evaluate and score the contents in the R circumscribed rectangles to obtain R scores; according to the The R scores determine the detection result of the target.
  • the vertex set of the three-dimensional model of the first target object includes: a first upper vertex (0 , 0, H1), and n equal points on the first bottom circle corresponding to the first upper vertex
  • H1 represents the first height
  • R1 represents the radius of the first bottom circle corresponding to the first upper vertex
  • the center of the first bottom circle is the origin (0,0,0)
  • the coordinate axis is the X axis
  • the three-dimensional model coordinate system defined by the front, the left of the Y axis, and the upward of the Z axis, k 0, 1, 2, ..., n-1, where n is a positive integer.
  • the vertex set of the three-dimensional model of the first target object includes: a second upper vertex (0 ,0, L*cos( ⁇ /3)), left vertex and the right vertex Among them, L represents the side length, with the center of the base as the origin (0, 0, 0), and the coordinate axis is the three-dimensional model coordinate system defined by the X axis forward, the Y axis left, and the Z axis upward.
  • the vertex set of the three-dimensional model of the first target object includes: m on the second bottom surface circle equivalence point and m equal points on the top circle corresponding to the second bottom circle
  • H2 represents the second height
  • R2 represents the radius of the second bottom surface circle
  • k 0, 1, 2, ..., m-1
  • m is a positive integer.
  • a fourth aspect of the embodiments of the present application discloses a chip system, the chip system includes at least one processor and an acquisition interface, and the at least one processor is configured to call a computer program from the acquisition interface to implement any one or any of the above-mentioned aspects.
  • a fifth aspect of the embodiments of the present application discloses a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, any one of the above-mentioned aspects or the possibility of any one of the above-mentioned aspects is implemented. Implement the method described in the method.
  • a sixth aspect of the embodiments of the present application discloses a vehicle, where the vehicle includes a target detection device in the second aspect or a target detection device in the third aspect.
  • FIG. 1 is a schematic diagram of a target obstacle in a traffic scene provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a target detection system provided by an embodiment of the present application.
  • FIG. 3 is an example diagram of a traffic scene provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a target detection method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an image coordinate system provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a first image provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another first image provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of still another first image provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a region of interest obtained by processing a first image according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of another region of interest obtained by processing the first image according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of another region of interest obtained by processing the first image according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a back projection process provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a camera coordinate system and an image coordinate system provided by an embodiment of the present application.
  • 15 is a schematic diagram of a three-dimensional model of a first target object provided by an embodiment of the present application.
  • 16 is a schematic diagram of another three-dimensional model of a first target object provided by an embodiment of the present application.
  • 17 is a schematic diagram of another three-dimensional model of a first target object provided by an embodiment of the present application.
  • FIG. 18 is a process of placing a three-dimensional model of a first target object at the position of the coordinates of the vehicle body coordinate system corresponding to the reference point provided by an embodiment of the present application;
  • FIG. 19 is a schematic diagram of a back projection provided by an embodiment of the present application.
  • FIG. 20 is a schematic diagram of a pixel area provided by an embodiment of the present application.
  • FIG. 21 is a schematic diagram illustrating a pixel region in a first image provided by an embodiment of the present application.
  • FIG. 22 is a schematic diagram of another pixel region provided by an embodiment of the present application.
  • FIG. 23 is a schematic diagram of another pixel area provided by an embodiment of the present application.
  • 24 is a schematic diagram of a circumscribed rectangle provided by an embodiment of the present application.
  • 25 is a schematic diagram of a detection result of a target provided by an embodiment of the present application.
  • 26 is a schematic diagram of a detection result of a target provided by an embodiment of the present application.
  • FIG. 27 is a schematic diagram of a detection result of a target provided by an embodiment of the present application.
  • FIG. 28 is a schematic diagram of a target detection apparatus provided by an embodiment of the present application.
  • FIG. 29 is a schematic diagram of an apparatus for detecting a target provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a target detection system 2000 provided by an embodiment of the present application.
  • the system includes an acquisition module 2001, a processing module 2002, and a planning and control module 2003, wherein the acquisition module 2001 is used for acquiring Detecting images; the processing module 2002 is used to detect obstacles in the to-be-detected images acquired by the acquisition module 2001; the planning and control module 2003 is used to receive the output of the processing module 2002 to plan and control the behavior of the mobile platform itself.
  • the system 2000 can be applied to movable platforms such as vehicles, robots, and the like.
  • 3-Dimension projection refers to the process of mapping a point in a three-dimensional space to a two-dimensional plane.
  • 3D projection mainly refers to the process of mapping points in the world space to a 2D image plane through a camera model (for example, a pinhole model).
  • the world space can be a vehicle body coordinate system.
  • Back-projection The inverse process of three-dimensional projection, which refers to the process of mapping points in a two-dimensional plane to a three-dimensional space.
  • backprojection mainly refers to the process of mapping the points in the two-dimensional image plane, through the camera model and some geometric constraints (for example, the ideal ground plane assumption), into the world space, which can be body coordinate system.
  • Flat-earth assumption refers to the fact that the road on which the vehicle is traveling is considered to be an ideal plane. Based on this assumption, backprojection can be implemented, that is, starting from the pixels belonging to the road surface in the two-dimensional image plane, to find the ideal ground plane in the world space, that is, the point corresponding to a plane defined on the vehicle body coordinate system.
  • Convex hull Given a point set, the convex envelope of this point set is the convex polygon with the smallest area containing all the points in the point set. Intuitively, a convex polygon is a polygon without any concave bits.
  • Non-maximum suppression refers to an algorithm that searches for local maxima and removes non-maximum values, and is often used for post-processing of detection frames in target detection.
  • the input of the algorithm is a set of candidate boxes and the corresponding score for each candidate box, and the output is a subset of the candidate boxes.
  • the specific steps are: first, set all the boxes to be unsuppressed, and sort all the boxes according to the scores from large to small; start traversing from the box with the highest score, for each box, if the box is not suppressed, then Boxes whose coincidence degree is greater than the threshold are set to be suppressed; finally, the boxes that are not suppressed are returned.
  • IOU Intersection-over-Union
  • S area(C) ⁇ ? ? ? ? (G) is the area of the intersection between the first area C and the second area G
  • S area(C) ⁇ ? rea(G) is the area of the union between the first region C and the second region G.
  • the intersection ratio IOU of the convex envelope contour of the first region of interest among the Q regions of interest and the contour of the pixel region corresponding to the first region of interest is greater than a fourth preset value , taking the first region of interest as one of the R regions of interest, where the fourth preset value is 70%, then the convex envelope contour of the first region of interest among the Q regions of interest
  • the included region is the first region, and the region included in the outline of the pixel region corresponding to the first region of interest may be the second region.
  • Clustering refers to the process of dividing a collection of physical or abstract objects into classes of similar objects.
  • a cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster and different from objects in other clusters.
  • the technologies related to real-time detection of target obstacles in traffic scenes are mainly divided into two categories: the first category: measuring the distance, speed, azimuth, etc. of surrounding objects through sensors such as millimeter wave radar and lidar, and based on physical or geometric principles information, or form a three-dimensional point cloud or depth information of the surrounding environment, to detect target obstacles.
  • the second category the visible light image collected by the camera, based on the processing, analysis and learning of the imaging characteristics of the target object, to identify the object in the picture, so as to achieve the purpose of detecting the target obstacle in the image.
  • the candidate target area is detected and identified to achieve the purpose of detecting traffic cones.
  • the specific process is as follows: First, obtain the compression and activation network SENet and the dense convolutional network DenseNet, then determine the target network structure based on SENet and DenseNet and the preset target detection model, that is, design the cascade network structure, and then based on multiple traffic
  • the original traffic cone scene image of the cone is used to train the target network structure to obtain a traffic cone recognition model; then the image to be recognized is input into the traffic cone recognition model, and the recognition result is output.
  • the recognition result is that there are traffic cones in the image to be recognized and traffic cones are to be recognized Location in the image, or no traffic cones in the image to be recognized.
  • detecting target obstacles in this way has the following disadvantages: the algorithm complexity of the cascaded network framework is high, and the computing power is relatively large. If the computing power of the computing platform is limited, it is difficult to deploy or form real-time effective detection. Moreover, the traditional target detection framework based on supervised learning needs to train the model, and the degree of dependence on the quantity and separation is relatively high. Therefore, it is necessary to carry out a certain scale of data collection and labeling for different types of target obstacles. When the number and distribution of training samples When insufficient, it is difficult to form effective real-time detection.
  • the target obstacle is detected by clustering the optical flows in the scene.
  • the specific process is as follows: First, the optical flow between adjacent frames is calculated. Then, based on the optical flow field, the pixels whose positions are close to each other and whose displacement vectors are similar are clustered. The criteria for clustering: whether there is a common focus of expansion and a common scale magnitude. Then, these clustered optical flow clusters are output as the foreground target area, that is, the target obstacle detection result.
  • FIG. 3 it represents an example of a traffic scene, which includes foreground objects such as pedestrians (320) and cars (310), and a static background (eg, ground, road, etc.); as shown in Figure 3 3-2 shows the corresponding feature space (composed of the abscissa X of the optical flow collection point, the abscissa Y of the optical flow collection point, and the optical flow vector scale), which contains the corresponding Feature points (350), feature points corresponding to cars (340), and feature subspaces (330) corresponding to static backgrounds.
  • the optical flow generated by pedestrians crossing the road has a common convergence point, and the same scale (S2), thus forming a feature point (350) in the feature space.
  • the optical flow of the static background has a common convergence point, but has a wide scale distribution, so its corresponding feature subspace (330) is cylindrical rather than point-shaped. Since the optical flow convergence point of pedestrians is obviously different from that of the static background, in the feature space, the feature points (350) corresponding to pedestrians and the feature subspace (330) corresponding to the static background have a larger distance. easy to distinguish. That is to say, by clustering the optical flow in the above feature space, the pedestrian in the example can be effectively detected.
  • detection in this way has the following disadvantages: based on the global dense optical flow, the computational overhead is large, which poses a challenge to the real-time performance of the algorithm; the first prerequisite for an object to be detected is the corresponding optical flow.
  • the optical flow generated by the car in front (310) has a common convergence point and the same scale (S1), thus forming a feature point (340) in the feature space.
  • the execution body of the method is a movable platform, and the movable platform can be a vehicle, a robot, or the like.
  • FIG. 4 is a target detection method provided by an embodiment of the present application. The method includes but is not limited to the following steps:
  • Step S401 Acquire a first image.
  • the acquisition of the first image refers to the image captured by the camera; when the method of the embodiment of the present application is applied outside the camera
  • acquiring the first image refers to receiving an image from a camera
  • the camera may be a monocular camera, a binocular camera, a multi-camera camera, or a surround-view camera, etc., which is not limited here.
  • Step S402 Process the first image to obtain Q regions of interest, and determine the coordinates of the image coordinate system corresponding to the reference point in each region of interest.
  • Q is a positive integer
  • Q regions of interest may be obtained by processing the first image by limiting the color range of the first image or extracting edges.
  • the process of processing the first image by limiting the color range of the first image is as follows: converting the color space of the first image into hue, saturation, and brightness values (Hue, Saturation, Value, HSV), and then limit the ranges of the three dimensions of hue, saturation, and brightness to obtain Q regions of interest.
  • the process of obtaining Q regions of interest by processing the first image by extracting the edges is as follows: Usually, the pixel value distribution of the object and the road surface is different, so the boundary of the object tends to appear due to the pixel value distribution. There are obvious edge features caused by large value changes, so by processing the first image through the edge operator, Q regions of interest can be obtained.
  • the reference point in each region of interest may be the bottom midpoint of each region of interest, or may be other points in each region of interest, which is not limited here, and a unified Just take the standard.
  • the image coordinate system can be divided into an image pixel coordinate system and an image physical coordinate system.
  • the origin O1 of the image physical coordinate system is the intersection of the camera optical axis and the imaging plane, that is, the main point, and the X axis and the Y axis are respectively parallel to the camera coordinate system.
  • the Xc-axis and Yc-axis are the plane Cartesian coordinate system;
  • the image pixel coordinate system is a plane Cartesian coordinate system fixed on the image in units of pixels, its origin O 0 is located in the upper left corner of the image, and the u-axis and v-axis are parallel to the image
  • the X-axis and Y-axis of the physical coordinate system, the coordinates of the principal point in the uv coordinate system are (u 0 , v 0 ), as shown in FIG. 5 .
  • the coordinates of a point on the image in the image physical coordinate system and the image pixel coordinate system can be converted to each other.
  • the specific conversion process is as follows. It is assumed that the coordinates of a point on the image in the image physical coordinate system are (x, y), and the coordinates in the image pixel coordinate system are (u, v), and (u 0 , v 0 ) is the main The pixel coordinates of the point in the image pixel coordinate system. du and dv are the physical dimensions of a pixel on the X and Y axes, respectively, then
  • a specific example of processing the first image to obtain Q regions of interest is as follows:
  • the first image is shown in FIG. 6
  • the first image includes traffic cones
  • the color space of the first image is converted into HSV
  • the hue H, saturation S, and brightness value V are three
  • the range of each dimension is limited.
  • the orange-red subspace in the HSV color space is limited to 0° ⁇ H ⁇ 10°, 160° ⁇ H ⁇ 180°; 70 ⁇ S ⁇ 255; 100 ⁇ V ⁇ 255, thus obtaining Q regions of interest.
  • the first image is shown in FIG. 7 , the first image includes a warning triangle for motor vehicles, the color space of the first image is converted into HSV, and the hue H, saturation S, The ranges of the three dimensions of the brightness value V are limited.
  • the orange-red subspace in the HSV color space is limited to 0° ⁇ H ⁇ 10°, 160° ⁇ H ⁇ 180°; 70 ⁇ S ⁇ 255; 100 ⁇ V ⁇ 255, thus obtaining Q regions of interest.
  • the first image is shown in FIG. 8 , the first image includes a flat tire, the color space of the first image is converted into HSV, and the hue H, saturation S, brightness value V are The ranges of these three dimensions are limited.
  • the black subspace in the HSV color space is limited to 0° ⁇ H ⁇ 120°; 0 ⁇ S ⁇ 100; 100 ⁇ V ⁇ 20, so as to obtain Q area of interest.
  • the region of interest is filtered out.
  • S is a positive integer.
  • the first image is shown in FIG. 6
  • the value of S is 5
  • the first image is processed to obtain Q+1 regions of interest, wherein one of the Q+1 regions of interest If the number of pixel points contained in the region of interest is 2, then one region of interest whose number of points is 2 is filtered out, and finally Q regions of interest are obtained as shown in FIG. 9 .
  • the first image is as shown in FIG. 7
  • the value of S is 5
  • the first image is processed to obtain Q+2 regions of interest, wherein, among the Q+2 regions of interest, the The number of pixels contained in a region of interest is 2, and the number of pixels contained in a region of interest is 3, then a region of interest whose number of pixels is 2, and the number of pixels A region of interest of 3 is filtered out, and finally Q regions of interest are obtained as shown in Figure 10.
  • the first image is as shown in FIG. 8
  • the value of S is 15
  • the first image is processed to obtain Q+1 regions of interest, wherein the Q+1 regions of interest are If the number of pixels contained in a region of interest is 10, then a region of interest whose number of pixels is 10 is filtered out, and finally Q regions of interest are obtained as shown in Figure 11 .
  • Step S403 Determine the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest according to the coordinates of the image coordinate system corresponding to the reference point in each region of interest.
  • this process can be simply referred to as backprojection.
  • each reference point in the region of interest corresponds to a ground object
  • the purpose of backprojection is to find the coordinates of the vehicle body coordinate system corresponding to the assumed ground object.
  • the origin O3 of the vehicle body coordinate system is the projection of the center point of the rear axle of the vehicle on the ideal ground plane, that is, a plane defined in the vehicle body coordinate system
  • the coordinate axis is the front of the X w axis
  • the Y w axis is left
  • the Z w axis is upward.
  • the coordinates of the image coordinate system corresponding to the reference point in each region of interest are determined to determine the vehicle body corresponding to the reference point in each region of interest.
  • Coordinate system coordinates that is to say, the reference point in each region of interest is converted from the image coordinate system to the vehicle body coordinate system. The specific conversion relationship is as follows:
  • E norm represents the coordinates of the image coordinate system corresponding to the reference point in each region of interest in the normalized image coordinate system
  • K represents the camera's internal parameter matrix
  • e represents the reference point corresponding to each region of interest.
  • An ideal ground plane is defined in the vehicle body coordinate system. Since the origin O3 of the vehicle body coordinate system is the projection of the center point of the rear axle of the vehicle on the ideal ground plane, the coordinate axis is the X w axis forward and the Y w axis to the left.
  • the transformation matrix from the vehicle body coordinate system to the camera coordinate system, the above-mentioned normal vector n and the origin O3 of the vehicle body coordinate system are converted from the vehicle body coordinate system to the camera coordinate system, and the corresponding expression of the ideal ground plane in the camera coordinate system is obtained.
  • the formula Ax+By+Cz+D 0. Among them, A, B, C, D are known constants, and A, B, C are not zero at the same time.
  • the transformation matrix to the vehicle body coordinate system can obtain the point corresponding to the reference point in the vehicle body coordinate system.
  • the coordinates of the coordinates of the image coordinate system corresponding to the reference points in each region of interest are 1001, 1002, 1003, 1004, 1005, and 1006, respectively.
  • the external parameters, the internal reference matrix and the scale parameter determine the positions of the coordinates of the vehicle body coordinate system corresponding to the reference points in each region of interest as 1007, 1008, 1009, 1010, 1011, and 1012, respectively.
  • the correspondence between the position of the coordinates of the image coordinate system corresponding to the reference point in each region of interest, and the position of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest As shown in Table 1.
  • Step S404 Determine the three-dimensional model of the first target object.
  • the three-dimensional model of the first target object may be obtained by three-dimensional modeling of a traffic cone, a triangular warning sign for a motor vehicle, a flat tire, and the like, which is not limited here. Since the specific parameter information of traffic cones, warning triangles for motor vehicles and tires is clearly stipulated in international or international standards, it is possible to use traffic cones, warning triangles for motor vehicles and tires as reference objects to define the coordinates of the 3D model. system, so as to determine the vertex set of the three-dimensional model of the first target object.
  • the vertex set of the three-dimensional model of the first target object includes a first upper vertex (0, 0, H1), and n equal parts on the first bottom surface circle corresponding to the first upper vertex point
  • H1 represents the height
  • R1 represents the radius of the first bottom surface circle
  • k 0, 1, 2, ..., n-1
  • n is a positive integer.
  • the value of n is 36.
  • the lying tire (105) is obtained by performing three-dimensional modeling, and the details are as follows: define a three-dimensional model coordinate system, take the center of the second bottom surface circle as the origin (0, 0, 0), and the coordinate axes are the X-axis forward and the Y-axis.
  • the vertex set of the three-dimensional model of the first target object includes m equal points on the second base circle and m equal points on the top circle corresponding to the second bottom circle
  • H2 represents the height
  • R2 represents the radius of the second bottom surface circle
  • k 0, 1, 2, ..., m-1
  • m is a positive integer.
  • the value of m is 36.
  • Step S405 According to the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, and the vertex set of the three-dimensional model of the first target object, determine that the three-dimensional model of the first target object is in the At least one coordinate corresponding to the vehicle body coordinate system.
  • this process can be considered as placing the three-dimensional model of the first target object at the position of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, that is, the three-dimensional model of the first target object
  • the coordinates of the 3D model coordinate system corresponding to all the points in the vertex set are translated from the 3D model coordinate system to the vehicle body coordinate system.
  • the coordinates of the vehicle body coordinate system corresponding to the reference point in a region of interest are (X GP , Y GP , Z GP ), and a point in the vertex set of the three-dimensional model of the first target object is at
  • the coordinates in the three-dimensional model coordinate system are (P X , P Y , P Z )
  • the three-dimensional model of the first target object is placed on the reference point of interest, and the first target object is determined
  • the coordinates of a point in the set of vertices of the three-dimensional model in the vehicle body coordinate system are (X GP +P X , Y GP +P Y , Z GP +P Z ), and in this way it can be determined that the first
  • the coordinates of all points in the vertex set of the three-dimensional model of a target object in the vehicle body coordinate system that is to say, at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system is determined.
  • FIG. 18 shows the process of placing the three-dimensional model of the first target object at the position of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, that is, It is said that the reference point in each region of interest corresponds to a three-dimensional model of a first target object, and the Q regions of interest correspond to Q three-dimensional models of the first target object.
  • the position of the coordinates of the image coordinate system corresponding to the reference point in each region of interest, the position of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, and the first Table 2 shows the correspondence between the three-dimensional model of the target object and the position of at least one coordinate corresponding to the vehicle body coordinate system.
  • Step S406 Projecting at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system on the image coordinate system to obtain Q pixel regions.
  • this process can be referred to as three-dimensional projection.
  • the external parameters of the camera, the internal parameter matrix and the scale parameters at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system is determined by the vehicle body coordinate system.
  • Convert to the image coordinate system that is, determine at least one coordinate corresponding to the three-dimensional model of the first target object in the image coordinate system, and then perform at least one coordinate corresponding to the three-dimensional model of the first target object in the image coordinate system.
  • a point set of coordinates takes the contour to obtain Q pixel areas, where Q is a positive integer.
  • At least one coordinate corresponding to the three-dimensional model of the first target object in the image coordinate system is determined according to at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system, and the details are as follows:
  • (X c , Y c , Z c ) represents the coordinates of the object point P in the camera coordinate system in space
  • f represents the focal length of the camera
  • Z c represents the scale parameter
  • 0 T (0,0,0) T
  • R3 is the rotation matrix
  • t is the displacement vector
  • (X c , Y c , Z c ) represents the coordinates of the object point P in the camera coordinate system
  • ( X w , Y w , Z w ) represent the coordinates of the object point P in space in the vehicle body coordinate system.
  • At least one coordinate corresponding to the three-dimensional model of the first target object in the image coordinate system can be determined according to formula (7), that is, according to at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system, where [ u, v] T represents at least one coordinate corresponding to the three-dimensional model of the first target object in the image coordinate system, [X w , Y w , Z w ] represents at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system coordinates, Z c represents the scale parameter, and it can also be considered that the three-dimensional model of the first target object corresponds to at least one coordinate in the camera coordinate system, is the internal parameter matrix of the camera, R3 is the rotation matrix, and t is the displacement vector.
  • the positions of at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system are 1107, 1108, 1109, 1110, 1111, and 1112, respectively, then the first At least one coordinate corresponding to the three-dimensional model of the target object in the vehicle body coordinate system is projected on the image coordinate system to obtain Q pixel regions.
  • the three-dimensional model of the first target object is in the The positions of at least one coordinate corresponding to the vehicle body coordinate system are 1107, 1108, 1109, 1110, 1111, and 1112, and the Q pixel regions corresponding to 1201, 1202, 1203, 1204, 1205, and 1206 are respectively.
  • the Q pixel regions are schematically shown in FIG.
  • the position of the coordinates of the image coordinate system corresponding to the reference point in each region of interest, the position of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, and the Table 3 shows the correspondence between the position of at least one coordinate corresponding to the three-dimensional model of a target object in the vehicle body coordinate system and the Q pixel regions.
  • the first image is shown in FIG. 6
  • the Q regions of interest obtained by processing the first image are shown in FIG. 9 .
  • determine the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest determine the three-dimensional model of the first target object It is obtained by 3D modeling of the warning triangle for motor vehicles; then according to the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, the vertex set of the 3D model of the first target object , determine at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system; place at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system in the image
  • the coordinate system is projected to obtain Q pixel regions, as shown in Figure 22.
  • the first image is shown in FIG. 7
  • the Q regions of interest obtained by processing the first image are shown in FIG. 10 .
  • determine the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest determine the three-dimensional model of the first target object It is obtained by three-dimensional modeling of the flat tire; then, according to the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, and the vertex set of the three-dimensional model of the first target object, determine the at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system; carry out at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system in the image coordinate system Projection obtains Q pixel regions, as shown in Figure 23.
  • Step S407 Determine the detection result of the target according to the Q regions of interest and the Q pixel regions.
  • the Q regions of interest corresponding to the Q pixel regions are screened to obtain R regions of interest, wherein each pixel region corresponds to a region of interest, R is a positive integer and R is less than or equal to Q; then determine R R circumscribed rectangles corresponding to each region of interest; according to the R circumscribed rectangles, the detection result of the target is determined.
  • Each pixel region corresponds to a region of interest, which can be considered as a clustering process.
  • at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system is projected on the image coordinate system to obtain Q pixel areas, as shown in FIG. 20 , which includes 6 pixel areas.
  • the pixel areas are respectively pixel area 1201, pixel area 1202, pixel area 1203, pixel area 1204, pixel area 1205 and pixel area 1206.
  • the area of interest corresponding to pixel area 1201 refers to the white area part included in pixel area 1201, pixel area 1202
  • the corresponding area of interest refers to the white area part included in the pixel area 1202, the area of interest corresponding to the pixel area 1203, the area of interest corresponding to the pixel area 1204, the area of interest corresponding to the pixel area 1205, and the area of interest corresponding to the pixel area 1206.
  • the area of interest is the same, and will not be repeated here.
  • the Q regions of interest corresponding to the Q pixel regions are screened to obtain R regions of interest, which may include the following five screening rules:
  • the first region of interest is as one of the R regions of interest.
  • the first preset value may be 50%, assuming that the area of the first region of interest in the Q regions of interest is 20 pixels (pixels), and the area of the convex envelope of the first region of interest is 25 (pixels), since the ratio of the area 20 (pixels) of the first region of interest to the convex envelope area of the first region of interest is 25 (pixels) is 80%, 80% is greater than 50%, then the first sense The region of interest is reserved as one of the R regions of interest.
  • Rule 2 When the ratio of the convex envelope aspect ratio of the first region of interest in the Q regions of interest to the aspect ratio of the pixel region corresponding to the first region of interest is greater than the second preset value and less than a third preset value, the first region of interest is taken as one of the R regions of interest.
  • the second preset value is 0.5
  • the third preset value is 2.
  • Rule 3 When the ratio of the convex envelope area of the first region of interest in the Q regions of interest to the area of the pixel region corresponding to the first region of interest is greater than the second preset value and less than The condition of the third preset value is to use the first region of interest as one of the R regions of interest.
  • the second preset value is 0.5
  • the third preset value is 2.
  • Rule 4 When the intersection ratio of the convex envelope contour of the first region of interest in the Q regions of interest and the contour of the pixel region corresponding to the first region of interest is greater than the fourth preset value, the The first region of interest is one of the R regions of interest. In one example, the fourth preset value is 70%.
  • Rule 5 Use the pre-trained classifier to evaluate and score the content in the circumscribed rectangle of the first region of interest in the Q regions of interest, and obtain a first score; when the first score is higher than At the fifth preset value, the first region of interest is taken as one of the R regions of interest.
  • the above rule 1, rule 2, rule 3, rule 4 and rule 5 can be combined arbitrarily, which is not limited here.
  • the third preset value is that the ratio of the convex envelope area of the first region of interest to the area of the pixel region corresponding to the first region of interest is greater than the second preset value and less than the first
  • the intersection ratio IOU of the convex envelope contour of the first region of interest and the contour of the pixel region corresponding to the first region of interest is greater than a fourth preset value
  • the third A region of interest is one of the R regions of
  • determining the detection result of the target according to the R circumscribed rectangles includes: calculating the area of the R circumscribed rectangles; and determining the detection result of the target according to the area of the R circumscribed rectangles.
  • a non-maximum suppression algorithm can be used to determine the detection result of the target. First, let the R circumscribed rectangles not be suppressed, then sort the R circumscribed rectangles according to the size of the area from large to small, and then traverse from the circumscribed rectangle with the largest area. For each circumscribed rectangle, if it is not suppressed, then All the bounding rectangles whose coincidence degree is greater than the threshold are set to be suppressed, and finally the bounding rectangles that are not suppressed are returned.
  • other methods may also be used to determine the detection result of the target according to the area of the R circumscribed rectangles, which is not limited in this embodiment of the present application.
  • determining the detection result of the target according to the R circumscribed rectangles includes: using a pre-trained classifier to evaluate and score the contents of the R circumscribed rectangles to obtain R scores; The R scores determine the detection result of the target.
  • a non-maximum value suppression algorithm can be used to determine the detection result of the target. First, let the R circumscribed rectangles not be suppressed, then sort the R circumscribed rectangles according to the size of the R scores from large to small, and then traverse from the circumscribed rectangle with the largest score. Suppression, then set all bounding rectangles whose coincidence degree is greater than the threshold as suppression, and finally return the bounding rectangles that are not suppressed.
  • other methods may also be used to determine the detection result of the target according to the R scores, which is not limited in this embodiment of the present application.
  • Q pixel regions are obtained by projecting at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system on the image coordinate system.
  • Q is 6, each pixel area corresponds to an area of interest, and the area of interest corresponding to the 6 pixel areas is filtered through the filtering rules.
  • the filtering rules are rule 1, rule 2, rule 3 and rule 4 are satisfied at the same time, determine the figure
  • the region of interest corresponding to the pixel area 1201, the area of interest corresponding to the pixel area 1202, and the area of interest corresponding to the pixel area 1206 in 20 do not meet the above filtering rules, then the area of interest corresponding to the pixel area 1201 and the area of interest corresponding to the pixel area 1202
  • the region of interest corresponding to the pixel region 1206 and the region of interest corresponding to the pixel region 1206 are filtered out, then the region of interest corresponding to the pixel region 1203, the region of interest corresponding to the pixel region 1204 and the region of interest corresponding to the pixel region 1205 satisfy the above filtering rules, are reserve.
  • a rectangle bounding box is made for the region of interest corresponding to the pixel region 1203, the region of interest corresponding to the pixel region 1204, and the region of interest corresponding to the pixel region 1205, that is, the detection box, which are respectively the bounding rectangle 1401 and the bounding rectangle 1402.
  • the circumscribed rectangle 1403, the specific format is [x1, y1, w1, h1], where (x1, y1) is the image coordinates of the upper left corner of the rectangular frame, (w1, h1) is the pixel width and height of the rectangular frame, as shown in Figure 24 shown.
  • the first method is to calculate the areas of the circumscribed rectangle 1401, the circumscribed rectangle 1402 and the circumscribed rectangle 1403, and according to the non-maximum suppression algorithm that prioritizes the larger area, it is determined that the circumscribed rectangle 1402 and the circumscribed rectangle 1403 are removed, thereby outputting the circumscribed rectangle 1401, Finally, the detection result of the target is obtained, that is, the position of the traffic cone in the first image, as shown in Figure 25.
  • the second method is to use a pre-trained classifier to evaluate and score the contents of the circumscribed rectangle 1401, the circumscribed rectangle 1402 and the circumscribed rectangle 1403, and obtain 3 points; are not suppressed, the circumscribed rectangle 1401, circumscribed rectangle 1402 and circumscribed rectangle 1403 are sorted in descending order of score; traverse from the circumscribed rectangle with the highest score, for each circumscribed rectangle, if the circumscribed rectangle is not suppressed, then All bounding rectangles whose coincidence degree is greater than the threshold are set to be suppressed; finally, return the bounding rectangle that is not suppressed, that is, the bounding rectangle 1401, and finally obtain the detection result of the target, that is, the position of the traffic cone in the first image, such as shown in Figure 25.
  • Q pixel regions are obtained by projecting at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system on the image coordinate system.
  • Q is 2
  • each pixel area corresponds to an area of interest
  • the area of interest corresponding to the two pixel areas is filtered through the filtering rules.
  • the filtering rules are rule 1, rule 2, rule 3 and rule 4 are satisfied at the same time, determine the figure If the region of interest corresponding to the pixel region 1902 in 22 does not meet the above filtering rules, then the region of interest corresponding to the pixel region 1902 is filtered out, and the region of interest corresponding to the pixel region 1901 satisfies the above filtering rules and is retained.
  • the detection result of the target is determined, that is, the position of the vehicle warning triangle in the first image, as shown in FIG. 26 .
  • Q pixel regions are obtained by projecting at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system on the image coordinate system.
  • each One pixel area corresponds to one area of interest, and the area of interest corresponding to Q pixel areas is filtered through the filtering rule.
  • the filtering rule is that rule 1, rule 2, rule 3 and rule 4 are satisfied at the same time, and the pixel in Figure 23 is determined.
  • the region of interest corresponding to the region 2501 satisfies the above filtering rules and is retained, and the region of interest corresponding to the pixel region other than the pixel region 2501 does not meet the above filtering rules, then the region of interest corresponding to the pixel region other than the pixel region 2501 Region filtered out. Then make a rectangular bounding box for the region of interest corresponding to the pixel area 2501, that is, the detection box, the specific format is [x1, y1, w1, h1], where (x1, y1) is the upper left corner of the rectangular box. Image coordinates , (w1,h1) is the pixel width and height of the rectangular frame. Then, according to the circumscribed rectangle, determine the detection result of the target, that is, the position of the flat tire in the first image, as shown in Figure 27.
  • FIG. 28 is a schematic structural diagram of a target detection apparatus 2800 provided by an embodiment of the present application.
  • the target detection apparatus may include an acquisition module 2801 and a processing module 2802 , wherein the detailed description of each module is as follows.
  • the acquisition module 2801 is used to acquire the first image; the processing module 2802 is used to process the first image to obtain Q regions of interest, and determine the coordinates of the image coordinate system corresponding to the reference point in each region of interest; Wherein, Q is a positive integer; the processing module 2802 is configured to determine the vehicle corresponding to the reference point in each region of interest according to the coordinates of the image coordinate system corresponding to the reference point in each region of interest The coordinates of the body coordinate system; the processing module 2802 is used to determine the three-dimensional model of the first target object; the processing module 2802 is used to determine the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest coordinates, a set of vertices of the three-dimensional model of the first target object, to determine at least one coordinate corresponding to the three-dimensional model of the first target object in the vehicle body coordinate system; the processing module 2802 is used to convert the first target object At least one coordinate corresponding to the three-dimensional model of a target object in the vehicle body coordinate system is projected on the
  • the processing module 2802 is further configured to perform screening processing on Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest, and each pixel region corresponds to one region of interest Region of interest, R is a positive integer and R is less than or equal to Q; determine R circumscribed rectangles corresponding to the R regions of interest; determine the detection result of the target according to the R circumscribed rectangles.
  • the processing module 2802 is further configured to account for the area of the first region of interest in the Q regions of interest to account for the area of the convex envelope of the first region of interest
  • the first region of interest is taken as one of the R regions of interest
  • the convex envelope of the first region of interest in the Q regions of interest When the ratio of the aspect ratio to the aspect ratio of the pixel region corresponding to the first region of interest satisfies a condition that is greater than the second preset value and less than the third preset value, the first region of interest is used as One of the R regions of interest
  • the ratio of the convex envelope area of the first region of interest in the Q regions of interest to the area of the pixel region corresponding to the first region of interest satisfies
  • the second preset value is smaller than the third preset value
  • the first region of interest is taken as one of the R regions of interest; among the Q regions of interest In the case where the intersection ratio of the convex envelope contour
  • the processing module 2802 is further configured to use a pre-trained classifier to evaluate and score the content in the circumscribed rectangle of the first region of interest among the Q regions of interest , and obtain a first score; in the case that the first score is higher than the fifth preset value, the first region of interest is taken as one of the R regions of interest.
  • the processing module 2802 is further configured to calculate the area of the R circumscribed rectangles; and determine the detection result of the target according to the area of the R circumscribed rectangles.
  • the processing module 2802 is further configured to use a pre-trained classifier to evaluate and score the contents in the R circumscribed rectangles to obtain R scores; according to the R A score determines the detection result of the target.
  • the vertex set of the three-dimensional model of the first target object includes: a first upper vertex (0, 0, H1) and a first bottom surface corresponding to the first upper vertex n equal points on a circle
  • H1 represents the first height
  • R1 represents the radius of the first bottom circle corresponding to the first upper vertex
  • the center of the first bottom circle is the origin (0,0,0)
  • the coordinate axis is the X axis
  • the three-dimensional model coordinate system defined by the front, the left of the Y axis, and the upward of the Z axis, k 0, 1, 2, ..., n-1, where n is a positive integer.
  • the vertex set of the three-dimensional model of the first target object includes: a second upper vertex (0,0, L*cos( ⁇ /3)), a left vertex and the right vertex Among them, L represents the side length, with the center of the base as the origin (0, 0, 0), and the coordinate axis is the three-dimensional model coordinate system defined by the X axis forward, the Y axis left, and the Z axis upward.
  • the vertex set of the three-dimensional model of the first target object includes: m equal points on the second bottom circle and m equal points on the top circle corresponding to the second bottom circle
  • H2 represents the second height
  • R2 represents the radius of the second bottom surface circle
  • k 0, 1, 2, ..., m-1
  • m is a positive integer
  • each module may also correspond to the corresponding description with reference to the method embodiment shown in FIG. 4 .
  • FIG. 29 is a target detection apparatus 2900 provided by an embodiment of the present application.
  • the apparatus 2900 includes a processor 2901, a communication interface 2903, and optionally, a memory 2902.
  • the processor 2901, the memory 2902 and the communication interface 2903 are connected to each other through a bus 2904.
  • the memory 2902 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 2902 is used for related computer programs and data.
  • the communication interface 2903 is used to receive and transmit data.
  • the processor 2901 may be one or more central processing units (central processing units, CPUs).
  • CPUs central processing units
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 2901 in the device 2900 is configured to read the computer program code stored in the memory 2902, and execute the above-mentioned method performed in FIG. 4 .
  • the object detection device can be a vehicle with object detection function, or other components with object detection function.
  • the target detection device includes but is not limited to: vehicle-mounted terminal, vehicle-mounted controller, vehicle-mounted module, vehicle-mounted module, vehicle-mounted components, vehicle-mounted chip, vehicle-mounted unit, vehicle-mounted radar or vehicle-mounted camera and other sensors, the vehicle can control the vehicle through the vehicle-mounted terminal, vehicle-mounted device, vehicle-mounted module, vehicle-mounted module, vehicle-mounted component, vehicle-mounted chip, vehicle-mounted unit, vehicle-mounted radar or camera, and implement the method provided in this application.
  • the target detection device can also be other intelligent terminals with target detection function other than the vehicle, or set in other intelligent terminals with target detection function other than the vehicle, or set in a component of the intelligent terminal.
  • the intelligent terminal may be other terminal equipment such as intelligent transportation equipment, smart home equipment, and robots.
  • the target detection device includes, but is not limited to, a smart terminal or a controller, a chip, other sensors such as radar or a camera, and other components in the smart terminal.
  • the target detection device may be a general-purpose device or a special-purpose device.
  • the apparatus can also be a desktop computer, a portable computer, a network server, a PDA (personal digital assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or other devices with processing functions.
  • PDA personal digital assistant
  • the embodiment of the present application does not limit the type of the target detection device.
  • the target detection device may also be a chip or processor with a processing function, and the target detection device may include multiple processors.
  • the processor can be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the chip or processor with processing function may be arranged in the sensor, or may not be arranged in the sensor, but arranged at the receiving end of the output signal of the sensor.
  • An embodiment of the present application further provides a chip system, the chip system includes at least one processor and a communication interface, the at least one processor is configured to call a computer program from the communication interface, and when the processor executes the instruction , the method flow shown in FIG. 4 is realized.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is run on a computer, the method flow shown in FIG. 4 is implemented.
  • the embodiment of the present application further provides a computer program product, when the computer program product runs on a computer, the method flow shown in FIG. 4 is realized.
  • Embodiments of the present application further provide a vehicle, the vehicle including at least one target detection device.
  • a vehicle the vehicle including at least one target detection device.
  • Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented, and the process can be completed by a computer program or computer program-related hardware, and the computer program can be stored in a computer-readable storage medium. During execution, the processes of the foregoing method embodiments may be included.
  • the aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store computer program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

一种目标检测方法及装置,所述方法包括:获取第一图像(S401);然后对第一图像进行处理得到Q个感兴趣区域,确定每一个感兴趣区域中的参考点对应的图像坐标系的坐标(S402);根据所述参考点对应的图像坐标系的坐标,确定所述参考点对应的车体坐标系的坐标(S403);确定第一目标物体的三维模型(S404);根据所述参考点对应的车体坐标系的坐标、所述三维模型的顶点集合,确定所述三维模型在车体坐标系对应的至少一个坐标(S405);将所述至少一个坐标在图像坐标系进行投影得到Q个像素区域(S406);根据Q个感兴趣区域和Q个像素区域,确定目标的检测结果(S407)。该方法能够对交通场景中的障碍物实时检测,提升了在自动驾驶或者辅助驾驶中的高级驾驶辅助系统能力。

Description

一种目标检测方法及装置
本申请要求于2021年01月08日提交中国专利局、申请号为202110026498.9、申请名称为“一种目标检测方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种目标检测方法及装置。
背景技术
随着社会的发展,智能运输设备、智能家居设备、机器人等智能终端正在逐步进入人们的日常生活中。传感器在智能终端上发挥着十分重要的作用。安装在智能终端上的各式各样的传感器,比如毫米波雷达,激光雷达,摄像头,超声波雷达等,在智能终端的运动过程中感知周围的环境,收集数据,进行移动物体的辨识、追踪、测速、测距,以及静止场景如车道线、交通场景物体的识别和定位,并结合导航仪及地图数据进行路径规划以及其他行为控制。
在典型的交通场景中,会存在一些占据可行驶路面,影响车辆前行的物体,如图1所示,例如,交通锥、机动车用三角警告牌、平躺的轮胎等等,如何对这些物体进行实时检测,为后续的路径规划提供重要的信息是亟需解决的技术问题。
发明内容
本申请实施例公开了一种目标检测方法及装置,能够对交通场景中的障碍物物体进行实时检测,为后续的路径规划提供重要的信息,提升了在自动驾驶或者辅助驾驶中的高级驾驶辅助系统能力。
本申请实施例第一方面公开了一种目标检测方法,包括:获取第一图像;对所述第一图像进行处理得到Q个感兴趣区域,确定所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标;其中,Q为正整数;根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;确定第一目标物体的三维模型;根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域;根据所述Q个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果。
通过本申请实施例的方法,由于交通锥、机动车用三角警告牌以及轮胎等交通场景障碍物物体在国际或国家标准中都有明确的参数规定,因此可以通过上述交通场景障碍物物体为基准定义三维模型坐标系,从而获取第一目标物体的三维模型。本申请实施例,通过将第一目标物体的三维模型摆放在每一个感兴趣区域中的参考点对应的车体坐标上,从而获取第一目标物体的三维模型在车体坐标系对应的坐标,通过这样的方式,即使上述交通场景障碍物物体的形状大小尺寸存在一定程度的变化,但只要整体形状没有发生严重的改变,通过本申请实施例,仍然可以检测出交通场景障碍物物体。而且本申请实施例通过将感兴趣区域的参 考点由图像坐标系转换到车体坐标系,然后在参考点对应的车体坐标系上摆放第一目标物体的三维模型,确定第一目标物体的三维模型的在车体坐标系的坐标,然后将三维模型在车体坐标系的坐标在图像坐标系进行投影匹配,也就是说本申请实施例通过对第一目标物体的成像特征进行处理分析,达到检测的目的,而无需获取预先采集和标注得到的大量的训练样本进行训练,计算复杂度低。
在一种可能的实现方式中,所述根据所述Q个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果,包括:对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,所述每一个像素区域对应一个感兴趣区域,R为正整数,且R小于等于Q;确定所述R个感兴趣区域对应的R个外接矩形;根据所述R个外接矩形,确定所述目标的检测结果。
在上述方法中,通过对Q个感兴趣区域进行筛选处理得到R个感兴趣区域能够有效的去除图像噪点,提升目标检测结果的准确性,根据R个外接矩形确定目标的检测结果可以是通过采用非极大值抑制算法该R个外接矩形进行处理,去除冗余的外接矩形,最终确定目标的检测结果,加快了目标检测的效率。
在又一种可能的实现方式中,所述对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,包括如下至少一种方式:当所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;当所述Q个感兴趣区域中的第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;当所述Q个感兴趣区域中的第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;当所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,包括如下方式:使用预先训练好的分类器对所述Q个感兴趣区域中的第一感兴趣区域的外接矩形中的内容进行评价打分,获得第一分值;当所述第一分值高于第五预设值时,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述根据所述R个外接矩形,确定所述目标的检测结果,包括:计算所述R个外接矩形的面积;根据所述R个外接矩形的面积确定所述目标的检测结果。
在上述方法中,通过R个外接矩形的面积能够快速的确定目标的检测结果,加快了目标检测的效率。
在又一种可能的实现方式中,所述根据所述R个外接矩形,确定所述目标的检测结果,包括:使用预先训练好的分类器对所述R个外接矩形中的内容进行评价打分,获得R个分值;根据所述R个分值确定所述目标的检测结果。
本申请实施例第二方面公开了一种目标检测装置,包括:获取模块,用于获取第一图像;处理模块,用于对所述第一图像进行处理得到Q个感兴趣区域,确定每一个感兴趣区域中的参考点对应的图像坐标系的坐标;其中,Q为正整数;所述处理模块,用于根据所述每一个 感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;所述处理模块,用于确定第一目标物体的三维模型;所述处理模块,用于根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;所述处理模块,用于将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域;所述处理模块,用于根据所述至少一个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果。
在一种可能的实现方式中,所述处理模块,还用于对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,所述每一个像素区域对应一个感兴趣区域,R为正整数且R小于等于Q;确定所述R个感兴趣区域对应的R个外接矩形;根据所述R个外接矩形,确定所述目标的检测结果。
在又一种可能的实现方式中,所述处理模块,还用于在所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述处理模块,还用于使用预先训练好的分类器对所述Q个感兴趣区域中的第一感兴趣区域的外接矩形中的内容进行评价打分,获得第一分值;在所述第一分值高于第五预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述处理模块,还用于计算所述R个外接矩形的面积;根据所述R个外接矩形的面积确定所述目标的检测结果。
在又一种可能的实现方式中,所述处理模块,还用于使用预先训练好的分类器对所述R个外接矩形中的内容进行评价打分,获得R个分值;根据所述R个分值确定所述目标的检测结果。
关于第二方面或可能的实现方式所带来的技术效果,可参考对于第一方面或相应的实现方式的技术效果的介绍。
本申请实施例第三方面公开了一种目标检测装置,包括:处理器和存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序包括计算机执行指令,所述处理器用于调用所述存储器存储的一个或多个程序,执行如下操作:获取第一图像;对所述第一图像进行处理得到Q个感兴趣区域,确定所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标;根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;确定第一目标物体的三维模型;根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;将所述第一目标 物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域;根据所述Q个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果,其中,Q为正整数。
在一种可能的实现方式中,所述至少一个处理器,还用于对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,所述每一个像素区域对应一个感兴趣区域,R为正整数,且R小于等于Q;确定所述R个感兴趣区域对应的R个外接矩形;根据所述R个外接矩形,确定所述目标的检测结果。
在又一种可能的实现方式中,所述至少一个处理器,还用于在所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述至少一个处理器,还用于使用预先训练好的分类器对所述Q个感兴趣区域中的第一感兴趣区域的外接矩形中的内容进行评价打分,获得第一分值;当所述第一分值高于第五预设值时,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述至少一个处理器,还用于计算所述R个外接矩形的面积;根据所述R个外接矩形的面积确定所述目标的检测结果。
在又一种可能的实现方式中,所述至少一个处理器,还用于使用预先训练好的分类器对所述R个外接矩形中的内容进行评价打分,获得R个分值;根据所述R个分值确定所述目标的检测结果。
关于第三方面或可能的实现方式所带来的技术效果,可参考对于第一方面或相应的实现方式的技术效果的介绍。
结合上述任意一个方面或者任意一个方面的任意一种可能的实现方式,在又一种可能的实现方式中,所述第一目标物体的三维模型的顶点集合,包括:一个第一上顶点(0,0,H1)、以及所述第一上顶点对应的第一底面圆上的n个等分点
Figure PCTCN2021131569-appb-000001
其中,H1表示第一高度,R1表示所述第一上顶点对应的第一底面圆的半径,以所述第一底面圆的圆心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系,k=0,1,2,…,n-1,所述n为正整数。
结合上述任意一个方面或者任意一个方面的任意一种可能的实现方式,在又一种可能的实现方式中,所述第一目标物体的三维模型的顶点集合,包括:一个第二上顶点(0,0,L*cos(π/3))、左顶点
Figure PCTCN2021131569-appb-000002
和右顶点
Figure PCTCN2021131569-appb-000003
其中,L表示边长,以 底边的中心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系。
结合上述任意一个方面或者任意一个方面的任意一种可能的实现方式,在又一种可能的实现方式中,所述第一目标物体的三维模型的顶点集合,包括:第二底面圆上的m个等分点
Figure PCTCN2021131569-appb-000004
以及所述第二底面圆对应的顶面圆上的m个等分点
Figure PCTCN2021131569-appb-000005
其中,H2表示第二高度,R2表示所述第二底面圆的半径,k=0,1,2,…,m-1,m为正整数。
本申请实施例第四方面公开了一种芯片系统,所述芯片系统包括至少一个处理器和获取接口,所述至少一个处理器用于从所述获取接口调用计算机程序,实现上述任意一方面或任意一方面的可能的实现方式中所描述的方法。
本申请实施例第五方面公开了一种计算机可读存储介质,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,实现上述任意一方面或任意一方面的可能的实现方式中所描述的方法。
本申请实施例第六方面公开了一种车辆,所述车辆包括第二方面中的一种目标检测装置或第三方面中的一种目标检测装置。
附图说明
以下对本申请实施例用到的附图进行介绍。
图1是本申请实施例提供的一种交通场景中的目标障碍物的示意图;
图2是本申请实施例提供的一种目标检测系统的结构示意图;
图3是本申请实施例提供的一种交通场景示例图;
图4是本申请实施例提供的一种目标检测方法的流程示意图;
图5是本申请实施例提供的一种图像坐标系的示意图;
图6是本申请实施例提供的一种第一图像的示意图;
图7是本申请实施例提供的又一种第一图像的示意图;
图8是本申请实施例提供的又一种第一图像的示意图;
图9是本申请实施例提供的一种对第一图像处理得到的感兴趣区域的示意图;
图10是本申请实施例提供的又一种对第一图像处理得到的感兴趣区域的示意图;
图11是本申请实施例提供的又一种对第一图像处理得到的感兴趣区域的示意图;
图12是本申请实施例提供的一种反投影过程的示意图;
图13是本申请实施例提供的一种相机坐标系和图像坐标系的示意图;
图14是本申请实施例提供的一种参考点对应的图像坐标系的坐标的位置、与参考点对应的车体坐标系的坐标的位置之间的对应关系;
图15是本申请实施例提供的一种第一目标物体的三维模型的示意图;
图16是本申请实施例提供的又一种第一目标物体的三维模型的示意图;
图17是本申请实施例提供的又一种第一目标物体的三维模型的示意图;
图18是本申请实施例提供的一种在参考点对应的车体坐标系的坐标的位置上摆放第一目标物体的三维模型的过程;
图19是本申请实施例提供的一种反投影的示意图;
图20是本申请实施例提供的一种像素区域的示意图;
图21是本申请实施例提供的一种像素区域在第一图像中示意的示意图;
图22是本申请实施例提供的又一种像素区域的示意图;
图23是本申请实施例提供的又一种像素区域的示意图;
图24是本申请实施例提供的一种外接矩形的示意图;
图25是本申请实施例提供的一种目标的检测结果的示意图;
图26是本申请实施例提供的一种目标的检测结果的示意图;
图27是本申请实施例提供的一种目标的检测结果的示意图;
图28是本申请实施例提供的一种目标检测装置的示意图;
图29是本申请实施例提供的一种目标的检测装置的示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。
请参见图2,图2是本申请实施例提供的一种目标检测系统2000的结构示意图,该系统包括获取模块2001、处理模块2002和规划和控制模块2003,其中,获取模块2001用于获取待检测图像;处理模块2002用于对获取模块2001获取的待检测图像中的障碍物进行检测;规划和控制模块2003用于接收处理模块2002的输出,对可移动平台自身的行为进行规划和控制。该系统2000可以应用于可移动平台,例如,车辆、机器人等。
下面对本申请中的部分用语进行解释说明,以便于理解。
三维投影(3-Dimension projection):指将三维空间中的点映射到二维平面上的过程。在计算机视觉领域,三维投影主要指的是将世界空间中的点,通过相机模型(例如,针孔模型),映射到二维图像平面的过程,该世界空间可以为车体坐标系。
反向投影(back-projection):三维投影的逆过程,指将二维平面中的点映射到三维空间中的过程。在计算机视觉领域,反向投影主要指的是将二维图像平面中的点,通过相机模型和一些几何约束(例如,理想地平面假设),映射到世界空间中的过程,该世界空间可以为车体坐标系。
理想地平面假设(flat-earth assumption):指自车行驶所在的路面被认为是一个理想的平面。基于这个假设,可以实现反向投影,即从二维图像平面中属于路面的像素点出发,找到世界空间中理想地平面,即车体坐标系上定义的一个平面对应的点。
凸包络(convex hull):给定一个点集,这个点集的凸包络就是包含点集中所有点的最小面积的凸多边形。直观的说,一个凸多边形(convex polygon)就是没有任何凹陷位的多边形。
非极大值抑制(non-maximum suppression,NMS):指一种搜索局部最大值,去除非极大值的算法,常用于目标检测中对检测框的后处理。算法的输入是一组候选框以及每个候选框对应的得分(score),输出是候选框的一个子集。具体的步骤为:首先,设所有的框都没有被抑制,所有框按照得分从大到小排序;从分数最高的框开始遍历,对于每一个框,如果该框没有被抑制,那么就将所有与它的重合程度大于阈值的框设为抑制;最后,返回没有被抑制的框。
交并比(Intersection-over-Union,IoU):IOU是在目标检测中使用的一个概念,是两个 区域的重叠率;简单来说,即两个区域面积的交集和并集的比值,以两个区域为第一区域和第二区域为例,那么第一区域与第二区域之间的交并比IOU的计算公式如下:
Figure PCTCN2021131569-appb-000006
其中,S area(C)∩????(G)为第一区域C与第二区域G之间的交集的面积,S area(C)∩?rea(G)为第一区域C与第二区域G之间的并集的面积。在一种示例中,所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个,其中,第四预设值为70%,那么Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓包括的区域为第一区域,所述第一感兴趣区域对应的像素区域的轮廓包括的区域可以为第二区域。
聚类:是指将物理或抽象对象的集合分成由类似的对象组成的多个类的过程。由聚类所生成的簇是一组数据对象的集合,这些对象与同一个簇中的对象彼此相似,与其他簇中的对象相异。
在交通场景中实时检测目标障碍物的相关技术主要分为两类:第一类:通过毫米波雷达、激光雷达等传感器,以及基于物理或几何原理,测量周围物体的距离、速度、方位角等信息,或形成周围环境的三维点云或深度信息,对目标障碍物进行探测。第二类:通过摄像头采集到的可见光图像,基于对目标物体成像特征的处理分析和学习,对画面中的物体进行识别,从而达到检测出图像中的目标障碍物的目的。
在一种方法中,以在交通场景中,目标障碍物为交通锥为例:基于监督学习,利用预先采集和标注得到的交通锥的训练样本,训练得到检测和识别模型,然后对画面中的候选目标区域进行检测和识别,达到检测出交通锥的目的。具体过程如下:首先,获取压缩和激活网络SENet和密集卷积网络DenseNet,然后基于SENet和DenseNet及预设的目标检测模型确定目标网络结构,也就是设计级联网络结构,然后基于多张包含交通锥的原始交通锥场景图像,训练目标网络结构得到交通锥识别模型;然后将待识别图像输入该交通锥识别模型,输出识别结果,识别结果为待识别图像中有交通锥及交通锥在待识别图像中的位置、或待识别图像中无交通锥。但是,通过这种方式检测目标障碍物存在如下缺点:级联网络框架的算法复杂度较高,对算力要求较大,如果计算平台的算力有限,则难以部署或形成实时有效的检测。而且基于监督学习的传统目标检测框架需要训练模型,对数量的数量和分别的依赖程度比较高,因此需要针对不同类型的目标障碍物进行一定规模的数据采集和标注,当训练样本的数量和分布不足时,则难以形成有效实时检测。
在又一种方法中,基于通常情况下,属于同一个前景物体的光流趋同、且与背景的光流明显不一致的前提,通过对场景中的光流进行聚类,达到检测出目标障碍物的目的。具体过程如下:首先,计算相邻帧之间的光流。然后,基于光流场,聚类位置相互靠近、且位移矢量相似的像素点。聚类的准则则:是否具有共同的光流汇集点(focus of expansion),以及共同的尺度(scale magnitude)。然后,输出这些被聚类的光流簇,作为前景目标区域,即目标障碍物检测结果。如图3中的3-1所示,表示一个交通场景示例,其中包括其中包含行人(320)与汽车(310)等前景物体,以及静态背景(例如,地面、道路等);如图3中的3-2所示给出了对应的特征空间(由光流汇集点的横坐标X、光流汇集点的横坐标Y,以及光流矢量尺度这三个维度构成),其中包含行人对应的特征点(350),汽车对应的特征点(340),还有静态背景对应的特征子空间(330)。穿越道路的行人产生的光流,有共同的汇集点,和相同的 尺度(S2),因此在特征空间中形成了一个特征点(350)。静态背景的光流,有共同的汇集点,但是存在较广的尺度分布,所以其对应的特征子空间(330)呈圆柱状,而非点状。由于行人的光流汇集点与静态背景的光流汇集点明显不同,因此在特征空间中,行人对应的特征点(350)和静态背景对应的特征子空间(330)有较大的距离,比较容易区分。也就是说通过在上述特征空间中对光流聚类,可以有效检测出示例中的行人。但是,通过这种方式检测存在如下缺点:基于全局稠密光流,计算开销较大,对算法的实时性造成挑战;物体能被检测到的首要前提,是有对应的光流,如果因为物体纹理单一或重复、运动模糊、帧间位移过大超出搜索范围、和自车相对静止等常见原因导致光流计算失效,则无法检测;检测性能依赖于光流的精度,如果对光流汇集点和尺度的估计不够准确,则无法形成有效聚类;对于在车辆正前方,运动状态为静止或者平行于车辆运动方向的物体,无法形成有效检测。如图3所示,前方的汽车(310)产生的光流,有共同的汇集点,和相同的尺度(S1),因此在特征空间中形成了一个特征点(340)。然而,由于这辆汽车正处于上述的运动状态(静止或者平行于车辆运动方向),其光流汇集点正好与静态背景的光流汇集点相同,导致了在特征空间中,其对应的特征点(340)被静态背景对应的特征子空间(330)所包含。也就是说通过在上述特征空间中对光流聚类,无法有效检测出示例中的汽车。
基于此,本申请实施例提出了以下解决方案。
该方法的执行主体为可移动平台,该可移动平台可以为车辆、机器人等等。
请参见图4,图4是本申请实施例提供的一种目标检测方法,该方法包括但不限于如下步骤:
步骤S401:获取第一图像。
具体地,获取第一图像的方式可以包括两种:当本申请实施例方法应用于摄像头的芯片中时,获取第一图像就是指摄像头拍摄的图像;当本申请实施例方法应用于摄像头之外的芯片中时,获取第一图像就是指从摄像头接收图像,摄像头可以为单目摄像头、双目摄像头、多目摄像头或者环视摄像头等等,此处不做限定。
步骤S402:对所述第一图像进行处理得到Q个感兴趣区域,确定所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标。
具体地,Q为正整数,可以通过对所述第一图像的色彩范围进行限定或者提取边缘的方式对所述第一图像进行处理得到Q个感兴趣区域。具体通过对所述第一图像的色彩范围进行限定的方式对所述第一图像进行处理的过程如下:将所述第一图像的色彩空间转换为色相、饱和度、亮度值(Hue,Saturation,Value,HSV),然后对色相、饱和度、亮度值这三个维度的范围进行限定从而得到Q个感兴趣区域。具体通过提取边缘的方式对所述第一图像进行处理得到Q个感兴趣区域的过程如下:通常情况下,物体与路面的像素取值分布不同,因此在物体边界处往往会呈现出因像素取值变化较大导致的明显边缘特征,所以通过边缘算子对第一图像进行处理,可以得到Q个感兴趣区域。
具体地,所述每一个感兴趣区域中的参考点可以为所述每一个感兴趣区域中的底部中点,也可以是每一个感兴趣区域中的其他点,此处不做限定,采取统一取点的标准即可。所述图像坐标系可以分为图像像素坐标系和图像物理坐标系,图像物理坐标系的原点O 1为相机光轴与成像平面的交点即主点,X轴和Y轴分别平行于相机坐标系的Xc轴与Yc轴,是平面直角坐标系;图像像素坐标系是固定在图像上以像素为单位的平面直角坐标系,其原点O 0位于图像左上角,u轴和v轴分别平行于图像物理坐标系的X轴和Y轴,主点在u-v坐标系中的坐 标为(u 0,v 0),如图5所示。图像上的一个点在图像物理坐标系和图像像素坐标系下的坐标可以相互转换。
具体转换过程如下,假设图像上的一个点在图像物理坐标系下的坐标为(x,y)、在图像像素坐标系下的坐标为(u,v),(u 0,v 0)为主点在图像像素坐标系下的像素坐标。du,dv分别为一个像素在X轴和Y轴上的物理尺寸,那么
Figure PCTCN2021131569-appb-000007
上述公式(1)可以用矩阵相乘形式表达,具体如下:
Figure PCTCN2021131569-appb-000008
对所述第一图像进行处理得到Q个感兴趣区域的具体示例如下:
在一种示例中,第一图像如图6所示,该第一图像中包括交通锥,将该第一图像的色彩空间转换为HSV,并且对色相H、饱和度S、亮度值V这三个维度的范围进行限定,在本申请实施例中,对HSV色彩空间中橙红色子空间限定为0°≤H≤10°,160°≤H≤180°;70≤S≤255;100≤V≤255,从而得到Q个感兴趣区域。
在又一种示例中,第一图像如图7所示,该第一图像中包括机动车用三角警告牌,将该第一图像的色彩空间转换为HSV,并且对色相H、饱和度S、亮度值V这三个维度的范围进行限定,在本申请实施例中,对HSV色彩空间中橙红色子空间限定为0°≤H≤10°,160°≤H≤180°;70≤S≤255;100≤V≤255,从而得到Q个感兴趣区域。
在又一种示例中,第一图像如图8所示,该第一图像中包括平躺轮胎,将该第一图像的色彩空间转换为HSV,并且对色相H、饱和度S、亮度值V这三个维度的范围进行限定,在本申请实施例中,对HSV色彩空间中黑色子空间限定为0°≤H≤120°;0≤S≤100;100≤V≤20,从而得到Q个感兴趣区域。
可选的,在一种可能的实现方式中,对第一图像进行处理得到一个或多个感兴趣区域之后,若所述一个或多个感兴趣区域中某个感兴趣区域包含的像素点的个数少于S个,则将该感兴趣区域滤除。
具体地,S为正整数。
在一种示例中,假设第一图像如图6所示,S的取值为5,对第一图像进行处理得到Q+1个感兴趣区域,其中,Q+1个感兴趣区域中的一个感兴趣区域包含的像素点的个数为2,那么将该点的个数为2的一个感兴趣区域滤除,最终得到的Q个感兴趣区域如图9所示。
在又一种示例中,假设第一图像如图7所示,S的取值为5,对第一图像进行处理得到Q+2个感兴趣区域,其中,Q+2个感兴趣区域中的一个感兴趣区域包含的像素点的个数为2,一个感兴趣区域包含的像素点的个数为3,那么将该像素点的个数为2的一个感兴趣区域、以及像素点的个数为3的一个感兴趣区域都滤除,最终得到的Q个感兴趣区域如图10所示。
在又一种示例中,假设第一图像如图8所示,S的取值为15,对第一图像进行处理得到Q+1个感兴趣区域,其中,Q+1个感兴趣区域中的一个感兴趣区域包含的像素点的个数为10,那么将该像素点的个数为10的一个感兴趣区域滤除,最终得到的Q个感兴趣区域如图11所示。
步骤S403:根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标。
具体地,该过程可以简称为反投影。如图12所示,假设每一个感兴趣区域中的参考点都对应一个地面物体,反投影的目的就是为了找到这个假定存在的地面物体对应的车体坐标系的坐标。在本申请实施例中,该车体坐标系的原点O 3为自车后轴中心点在理想地平面,即车体坐标系中定义的一个平面上的投影,坐标轴为X w轴向前,Y w轴向左,Z w轴向上。
具体地,根据相机的外部参数,内参矩阵和尺度参数将所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系坐标,也就是说将所述每一个感兴趣区域中的参考点由图像坐标系转换到车体坐标系,具体转换关系如下:
首先,将图像坐标系中每一个感兴趣区域中的参考点进行归一化处理:具体公式如下:
E norm=K -1e    (3)
其中,E norm表示归一化的图像坐标系中每一个感兴趣区域中的参考点对应的图像坐标系的坐标,K表示相机的内参矩阵,e表示每一个感兴趣区域中的参考点对应的图像坐标系的坐标。
然后,如图13所示,从相机原点O c出发,连接图像坐标系中每一个感兴趣区域中的参考点,得到相机坐标系中的射线的表达式Ray(t 1)=(x it 1,y it 1,t 1);其中,x i,y i表示每一个感兴趣区域中的参考点对应的图像坐标系的横坐标和纵坐标,t 1表示系数。
在车体坐标系定义一个理想地平面,由于车体坐标系的原点O 3为自车后轴中心点在理想地平面上的投影,坐标轴为X w轴向前,Y w轴向左,Z w轴向上,因此车体坐标系中的理想地平面可以由法向量n=[0,0,1]和车体坐标系的原点O 3(0,0,0)确定,已知从车体坐标系到相机坐标系的变换矩阵,将上述法向量n和车体坐标系的原点O 3从车体坐标系转换到相机坐标系,得到该理想地平面在相机坐标系中的对应表达式Ax+By+Cz+D=0。其中,A,B,C,D为已知常数,并且A,B,C不同时为零。
根据射线的表达式和理想地平面在相机坐标系中的对应表达式得到
Figure PCTCN2021131569-appb-000009
然后将t 1带入射线的表达式,即可得到射线与平面的交点,该交点为图像坐标系中每一个感兴趣区域中的参考点对应在相机坐标系中的点,然后通过从相机坐标系到车体坐标系的变换矩阵,可以得到参考点对应在车体坐标系中的点,也即通过上述过程根据每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定每一个感兴趣区域中的参考点对应的车体坐标系的坐标。
在一种示例中,如图14所示,假设所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标的位置分别为1001、1002、1003、1004、1005、1006,根据相机的外部参数,内参矩阵和尺度参数确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标的位置分别为1007、1008、1009、1010、1011、1012。具体的所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标的位置、所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标的位置之间的对应关系,如表格1所示。
表格1
Figure PCTCN2021131569-appb-000010
步骤S404:确定第一目标物体的三维模型。
具体的,第一目标物体的三维模型可以为对交通锥、机动车用三角警告牌和平躺的轮胎等等进行三维建模得到的,此处不做限定。由于交通锥、机动车用三角警告牌和轮胎的具体参数信息,国际或者国际标准中都有明确的规定,因此可以以交通锥、机动车用三角警告牌和轮胎为参照物,定义三维模型坐标系,从而确定第一目标物体的三维模型的顶点集合。
在一种示例中,如图15所示,交通锥可以表示为一个底面半径R1=0.15米、高度H1=0.7米的圆锥体(201),假设第一目标物体的三维模型为对交通锥(101)进行三维建模得到的,具体如下:定义一个三维模型坐标系,以第一底面圆的圆心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上,那么所述第一目标物体的三维模型的顶点集合包括一个第一上顶点(0,0,H1)、以及所述第一上顶点对应的第一底面圆上的n个等分点
Figure PCTCN2021131569-appb-000011
Figure PCTCN2021131569-appb-000012
其中,H1表示高度,R1表示第一底面圆的半径,k=0,1,2,…,n-1,n为正整数。在本申请实施例中,n取值为36。
在又一种示例中,如图16所示,机动车用三角警告牌模型可以表示为一个边长为L=0.5米的等边三角形(203),假设第一目标物体的三维模型为对机动车用三角警告牌(103)进行三维建模得到的,具体如下:定义一个三维模型坐标系,以底边的中心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上,那么所述第一目标物体的三维模型的顶点集合包括一个第二上顶点(0,0,L*cos(π/3))、左顶点
Figure PCTCN2021131569-appb-000013
和右顶点
Figure PCTCN2021131569-appb-000014
Figure PCTCN2021131569-appb-000015
其中,L表示边长。
在又一种示例中,如图17所示,平躺轮胎可以表示为一个底面半径R2=0.356米、高度H2=0.125米的圆柱体(205),假设第一目标物体的三维模型为对平躺轮胎(105)进行三维建模得到的,具体如下:定义一个三维模型坐标系,以第二底面圆的圆心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上,那么所述第一目标物体的三维模型的顶点集合包括第二底面圆上的m个等分点
Figure PCTCN2021131569-appb-000016
以及所述第二底面圆对应 的顶面圆上的m个等分点
Figure PCTCN2021131569-appb-000017
其中,H2表示高度,R2表示第二底面圆的半径,k=0,1,2,…,m-1,m为正整数。本实施例中,m的取值为36。
步骤S405:根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标。
具体地,该过程可以认为是在每一个感兴趣区域中的参考点对应的车体坐标系的坐标的位置上摆放第一目标物体的三维模型,也就是说,第一目标物体的三维模型的顶点集合中的所有点对应的三维模型坐标系的坐标从三维模型坐标系平移到车体坐标系。
在一种示例中,假设一个感兴趣区域中的参考点对应的车体坐标系的坐标为(X GP,Y GP,Z GP),第一目标物体的三维模型的顶点集合中的一个点在三维模型坐标系中的坐标为(P X,P Y,P Z),那么在所述一个感兴趣中的参考点上摆放所述第一目标物体的三维模型,确定所述第一目标物体的三维模型的顶点集合中的一个点在所述车体坐标系中的坐标为(X GP+P X,Y GP+P Y,Z GP+P Z),通过这样的方式能够确定所述第一目标物体的三维模型的顶点集合中的所有点在所述车体坐标系对应的坐标,也就是说确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标。
在一种示例中,如图18所示,表示在所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标的位置上摆放第一目标物体的三维模型的过程,也就是说所述每一个感兴趣区域中的参考点对应一个第一目标物体的三维模型,Q个感兴趣区域对应Q个第一目标物体的三维模型。具体的所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标的位置、所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标的位置、以及所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标的位置之间的对应关系如表格2所示。
表格2
Figure PCTCN2021131569-appb-000018
步骤S406:将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域。
具体地,这个过程可以简称为三维投影,根据相机的外部参数,内参矩阵和尺度参数,将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标由车体坐标系转换到图像坐标系,也就是确定所述第一目标物体的三维模型在所述图像坐标系对应的至少一个坐标,然后对所述第一目标物体的三维模型在所述图像坐标系对应的至少一个坐标的点集取轮 廓,从而得到Q个像素区域,其中Q为正整数。
具体地,根据第一目标物体的三维模型在车体坐标系对应的至少一个坐标确定第一目标物体的三维模型在图像坐标系对应的至少一个坐标,具体如下:
首先,图像物理坐标系O cXYZ与相机坐标系O cX cY cZ c之间的关系:
如图19所示假设空间上一物点P在相机坐标系下的坐标为(X c,Y c,Z c)。则该P点对应相点p在图像物理坐标系下的坐标为:
Figure PCTCN2021131569-appb-000019
其中,(X c,Y c,Z c)表示空间上一物点P在相机坐标系下的坐标,f表示相机焦距,Z c表示尺度参数。
其次,相机坐标系O cX cY cZ c与车体坐标系O wX wY wZ w之间的关系:
Figure PCTCN2021131569-appb-000020
其中,0 T=(0,0,0) T,R3为旋转矩阵,t为位移向量,(X c,Y c,Z c)表示空间上一物点P在相机坐标系下的坐标,(X w,Y w,Z w)表示空间上一物点P在车体坐标系下的坐标。
然后,根据公式(2)、公式(5)和公式(6)确定图像像素坐标系与车体坐标系之间的关系:
Figure PCTCN2021131569-appb-000021
因此,可以根据公式(7),也就是根据第一目标物体的三维模型在车体坐标系对应的至少一个坐标确定第一目标物体的三维模型在图像坐标系对应的至少一个坐标,其中,[u,v] T表示第一目标物体的三维模型在图像坐标系对应的至少一个坐标,[X w,Y w,Z w]表示第一目标物体的三维模型在车体坐标系对应的至少一个坐标,Z c表示尺度参数,也可以认为第一目标物体的三维模型在相机坐标系对应的至少一个坐标,
Figure PCTCN2021131569-appb-000022
为相机的内参矩阵,R3为旋转矩阵,t为位移向量。
在一种示例中,假设所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标的位置分别为1107、1108、1109、1110、1111、1112,那么将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域,如图20所示,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标的位置为1107、1108、1109、1110、1111、1112对应的Q个像素区域分别为1201、1202、1203、1204、1205、1206。将所述Q个像素区域在第一图像中示意如图21所示。具体地,所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标的位置、所述每一个感兴趣区域中的参考点对 应的车体坐标系的坐标的位置、以及所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标的位置、以及Q个像素区域之间的对应关系如表格3所示。
表格3
Figure PCTCN2021131569-appb-000023
在一种示例中,假设第一图像如图6所示,对所述第一图像进行处理得到的Q个感兴趣区域如图9所示。然后根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;确定第一目标物体的三维模型是通过对机动车用三角警告牌进行三维建模得到的;然后根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域,如图22所示。
在一种示例中,假设第一图像如图7所示,对所述第一图像进行处理得到的Q个感兴趣区域如图10所示。然后根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;确定第一目标物体的三维模型是通过对平躺轮胎进行三维建模得到的;然后根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域,如图23所示。
步骤S407:根据所述Q个感兴趣区域和Q个像素区域,确定目标的检测结果。
具体地,对Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,其中,每一个像素区域对应一个感兴趣区域,R为正整数且R小于等于Q;然后确定R个感兴趣区域对应的R个外接矩形;根据R个外接矩形,确定目标的检测结果。每一个像素区域对应一个感兴趣区域可以认为是一个聚类的过程。假设将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域,如图20所示,在图20中包括6个像素区域分别为像素区域1201、像素区域1202、像素区域1203、像素区域1204、像素区域1205和像素区域1206,像素区域1201对应的感兴趣区域是指像素区域1201包括的白色区域部分,像素区域1202对应的感兴趣区域是指像素区域1202中包括的白色区域部分,像素区域1203对应的感兴趣区域、像素区域1204对应的感兴趣区域、像素区域1205对应的感兴趣区域、像素区域1206对应的感兴趣区域都是同理,此处不再赘述。
在一种可能的实现方式中,对Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,可以包括以下5种筛选规则:
规则1:当所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。在一种示例中,第一预设值可以为50%,假设Q个感兴趣区域中的第一感兴趣区域的面积为20像素(pixels),第一感兴趣区域的凸包络面积为25(pixels),由于该第一感兴趣区域的面积20(pixels)与第一感兴趣区域的凸包络面积为25(pixels)的比值为80%,80%大于50%,那么该第一感兴趣区域保留,作为R个感兴趣区域中的一个。
规则2:当所述Q个感兴趣区域中的第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。在一种示例中,第二预设值为0.5,第三预设值为2。
规则3:当所述Q个感兴趣区域中的第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。在一种示例中,第二预设值为0.5,第三预设值为2。
规则4:当所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。在一种示例中,第四预设值为70%。
规则5:使用预先训练好的分类器对所述Q个感兴趣区域中的第一感兴趣区域的外接矩形中的内容进行评价打分,获得第一分值;当所述第一分值高于第五预设值时,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
以上规则1、规则2、规则3、规则4和规则5可以任意组合,此处不做限定。例如,当上述规则1、规则2、规则3和规则4同时满足时,即当所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值、所述第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件、所述第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件、且所述第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在一种可能的实现方式中,根据R个外接矩形,确定目标的检测结果包括:计算所述R个外接矩形的面积;根据所述R个外接矩形的面积确定所述目标的检测结果。
具体地,可以采用非极大值抑制算法确定目标的检测结果。首先设R个外接矩形都没有抑制,然后将该R个外接矩形按照面积的大小从大到小排序,然后从面积最大的外接矩形开始遍历,对每一个外接矩形,如果没有被抑制,那么将所有与它的重合程度大于阈值的外接矩形设为抑制,最后返回没有被抑制的外接矩形。当然也可以采用其他的方式根据R个外接矩形的面积确定目标的检测结果,本申请实施例不做限定。
在一种可能的实现方式中,根据R个外接矩形,确定目标的检测结果包括:使用预先训练好的分类器对所述R个外接矩形中的内容进行评价打分,获得R个分值;根据所述R个分值确定所述目标的检测结果。
具体地,获得R个分值之后可以采用非极大值抑制算法确定目标的检测结果。首先设R个外接矩形都没有抑制,然后将该R个外接矩形按照R个分值的大小从大到小排序,然后从分值最大的外接矩形开始遍历,对每一个外接矩形,如果没有被抑制,那么将所有与它的重合程度大于阈值的外接矩形设为抑制,最后返回没有被抑制的外接矩形。当然也可以采用其他的方式根据R个分值确定目标的检测结果,本申请实施例不做限定。
在一种示例中,假设将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域,如图20所示,Q为6,每一个像素区域对应一个感兴趣区域,通过过滤规则对6个像素区域对应的感兴趣区域进行过滤,此时过滤规则为规则1、规则2、规则3和规则4同时满足,确定图20中的像素区域1201对应的感兴趣区域、像素区域1202对应的感兴趣区域和像素区域1206对应的感兴趣区域不满足以上过滤规则,那么该像素区域1201对应的感兴趣区域、像素区域1202对应的感兴趣区域和像素区域1206对应的感兴趣区域被过滤掉,那么像素区域1203对应的感兴趣区域、像素区域1204对应的感兴趣区域和像素区域1205对应的感兴趣区域满足以上过滤规则,被保留。然后对像素区域1203对应的感兴趣区域、像素区域1204对应的感兴趣区域和像素区域1205对应的感兴趣区域做外接矩形(rectangular bounding box),即检测框,分别为外接矩形1401、外接矩形1402和外接矩形1403,具体格式为[x1,y1,w1,h1],其中(x1,y1)为矩形框的左上角图像坐标,(w1,h1)为矩形框的像素宽、高,如图24所示。
第一种方式,计算外接矩形1401、外接矩形1402和外接矩形1403的面积,根据面积大者优先的非极大值抑制算法,确定外接矩形1402和外接矩形1403被去除,从而输出外接矩形1401,最终得到目标的检测结果,也就是交通锥在第一图像中的位置,如图25所示。
第二种方式,使用预先训练好的分类器对外接矩形1401、外接矩形1402和外接矩形1403中的内容进行评价打分,获得3个分值;然后假设外接矩形1401、外接矩形1402和外接矩形1403都没有被抑制,外接矩形1401、外接矩形1402和外接矩形1403按照得分从大到小排序;从分数最高的外接矩形开始遍历,对于每一个外接矩形,如果该外接矩形没有被抑制,那么就将所有与它的重合程度大于阈值的外接矩形设为抑制;最后,返回没有被抑制的外接矩形,即外接矩形1401,最终得到目标的检测结果,也就是交通锥在第一图像中的位置,如图25所示。
在一种示例中,假设将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域,如图22所示,Q为2,每一个像素区域对应一个感兴趣区域,通过过滤规则对2个像素区域对应的感兴趣区域进行过滤,此时过滤规则为规则1、规则2、规则3和规则4同时满足,确定图22中的像素区域1902对应的感兴趣区域不满足以上过滤规则,那么该像素区域1902对应的感兴趣区域被过滤掉,像素区域1901对应的感兴趣区域满足以上过滤规则,被保留。然后对像素区域1901对应的感兴趣区域做外接矩形(rectangular bounding box),即检测框,具体格式为[x1,y1,w1,h1],其中(x1,y1)为矩形框的左上角图像坐标,(w1,h1)为矩形框的像素宽、高。然后根据外接矩形,确定目标的检测结果,也就是机动车用三角警告牌在第一图像中的位置,如图26所示。
在一种示例中,假设将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域,如图23所示,每一个像素区域对应一个感兴趣区域,通过过滤规则对Q个像素区域对应的感兴趣区域进行过滤,此时过滤规则为 规则1、规则2、规则3和规则4同时满足,确定图23中的像素区域2501对应的感兴趣区域满足以上过滤规则,被保留,除像素区域2501之外的像素区域对应的感兴趣区域不满足以上过滤规则,那么将除像素区域2501之外的像素区域对应的感兴趣区域过滤掉。然后对像素区域2501对应的感兴趣区域做外接矩形(rectangular bounding box),即检测框,具体格式为[x1,y1,w1,h1],其中(x1,y1)为矩形框的左上角图像坐标,(w1,h1)为矩形框的像素宽、高。然后根据外接矩形,确定目标的检测结果,也就是平躺轮胎在第一图像中的位置,如图27所示。
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的装置。
请参见图28,图28是本申请实施例提供的一种目标检测装置2800的结构示意图,该目标检测装置可以包括获取模块2801和处理模块2802,其中,各个模块的详细描述如下。
获取模块2801,用于获取第一图像;处理模块2802,用于对所述第一图像进行处理得到Q个感兴趣区域,确定每一个感兴趣区域中的参考点对应的图像坐标系的坐标;其中,Q为正整数;所述处理模块2802,用于根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;所述处理模块2802,用于确定第一目标物体的三维模型;所述处理模块2802,用于根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;所述处理模块2802,用于将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域;所述处理模块2802,用于根据所述至少一个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果。
在一种可能的实现方式中,所述处理模块2802,还用于对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,所述每一个像素区域对应一个感兴趣区域,R为正整数且R小于等于Q;确定所述R个感兴趣区域对应的R个外接矩形;根据所述R个外接矩形,确定所述目标的检测结果。
在又一种可能的实现方式中,所述处理模块2802,还用于在所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述处理模块2802,还用于使用预先训练好的分类器对所述Q个感兴趣区域中的第一感兴趣区域的外接矩形中的内容进行评价打分,获得第一分值;在所述第一分值高于第五预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
在又一种可能的实现方式中,所述处理模块2802,还用于计算所述R个外接矩形的面积;根据所述R个外接矩形的面积确定所述目标的检测结果。
在又一种可能的实现方式中,所述处理模块2802,还用于使用预先训练好的分类器对所 述R个外接矩形中的内容进行评价打分,获得R个分值;根据所述R个分值确定所述目标的检测结果。
在又一种可能的实现方式中,所述第一目标物体的三维模型的顶点集合,包括:一个第一上顶点(0,0,H1)、以及所述第一上顶点对应的第一底面圆上的n个等分点
Figure PCTCN2021131569-appb-000024
Figure PCTCN2021131569-appb-000025
其中,H1表示第一高度,R1表示所述第一上顶点对应的第一底面圆的半径,以所述第一底面圆的圆心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系,k=0,1,2,…,n-1,所述n为正整数。
在又一种可能的实现方式中,所述第一目标物体的三维模型的顶点集合,包括:一个第二上顶点(0,0,L*cos(π/3))、左顶点
Figure PCTCN2021131569-appb-000026
和右顶点
Figure PCTCN2021131569-appb-000027
其中,L表示边长,以底边的中心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系。
在又一种可能的实现方式中,所述第一目标物体的三维模型的顶点集合,包括:第二底面圆上的m个等分点
Figure PCTCN2021131569-appb-000028
以及所述第二底面圆对应的顶面圆上的m个等分点
Figure PCTCN2021131569-appb-000029
其中,H2表示第二高度,R2表示所述第二底面圆的半径,k=0,1,2,…,m-1,m为正整数。
需要说明的是,各个模块的实现及有益效果还可以对应参照图4所示的方法实施例的相应描述。
请参见图29,图29是本申请实施例提供的一种目标检测装置2900,该装置2900包括处理器2901和通信接口2903,可选的,还包括存储器2902,所述处理器2901、存储器2902和通信接口2903通过总线2904相互连接。
存储器2902包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器2902用于相关计算机程序及数据。通信接口2903用于接收和发送数据。
处理器2901可以是一个或多个中央处理器(central processing unit,CPU),在处理器2901是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
该装置2900中的处理器2901用于读取所述存储器2902中存储的计算机程序代码,执行上述图4所执行的方法。
该目标检测装置可为具有目标检测功能的车辆,或者为具有目标检测功能的其他部件。该目标检测装置包括但不限于:车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元、车载雷达或车载摄像头等其他传感器,车辆可通过该车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元、车载雷达或摄像头,实施本申请提供的方法。
该目标检测装置还可以为除了车辆之外的其他具有目标检测功能的智能终端,或设置在除了车辆之外的其他具有目标检测功能的智能终端中,或设置于该智能终端的部件中。该智能终端可以为智能运输设备、智能家居设备、机器人等其他终端设备。该目标检测装置包括但不限于智能终端或智能终端内的控制器、芯片、雷达或摄像头等其他传感器、以及其他部件等。
该目标检测装置可以是一个通用设备或者是一个专用设备。在具体实现中,该装置还可以台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或其他具有处理功能的设备。本申请实施例不限定该目标检测装置的类型。
该目标检测装置还可以是具有处理功能的芯片或处理器,该目标检测装置可以包括多个处理器。处理器可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。该具有处理功能的芯片或处理器可以设置在传感器中,也可以不设置在传感器中,而设置在传感器输出信号的接收端。
本申请实施例还提供一种芯片系统,所述芯片系统包括至少一个处理器和通信接口,所述至少一个处理器用于从所述通信接口调用计算机程序,当所述处理器执行所述指令时,图4所示的方法流程得以实现。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,图4所示的方法流程得以实现。
本申请实施例还提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,图4所示的方法流程得以实现。
本申请实施例还提供一种车辆,所述车辆包括至少一个目标检测装置。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来计算机程序相关的硬件完成,该计算机程序可存储于计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储计算机程序代码的介质。

Claims (22)

  1. 一种目标检测方法,其特征在于,包括:
    获取第一图像;
    对所述第一图像进行处理得到Q个感兴趣区域,确定所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标;
    根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;
    确定第一目标物体的三维模型;
    根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;
    将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域;
    根据所述Q个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果;其中,所述Q为正整数。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述Q个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果,包括:
    对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,所述每一个像素区域对应一个感兴趣区域,R为正整数,且R小于等于Q;
    确定所述R个感兴趣区域对应的R个外接矩形;
    根据所述R个外接矩形,确定所述目标的检测结果。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,包括如下至少一种方式:
    当所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;
    当所述Q个感兴趣区域中的第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;
    当所述Q个感兴趣区域中的第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;
    当所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
  4. 根据权利要求2所述的方法,其特征在于,所述对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,包括如下方式:
    使用预先训练好的分类器对所述Q个感兴趣区域中的第一感兴趣区域的外接矩形中的内容进行评价打分,获得第一分值;
    当所述第一分值高于第五预设值时,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
  5. 根据权利要求2-4任一项所述的方法,其特征在于,所述根据所述R个外接矩形,确定所述目标的检测结果,包括:
    计算所述R个外接矩形的面积;
    根据所述R个外接矩形的面积确定所述目标的检测结果。
  6. 根据权利要求2-4任一项所述的方法,其特征在于,所述根据所述R个外接矩形,确定所述目标的检测结果,包括:
    使用预先训练好的分类器对所述R个外接矩形中的内容进行评价打分,获得R个分值;
    根据所述R个分值确定所述目标的检测结果。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述第一目标物体的三维模型的顶点集合,包括:
    一个第一上顶点(0,0,H1)、以及所述第一上顶点对应的第一底面圆上的n个等分点
    Figure PCTCN2021131569-appb-100001
    其中,H1表示第一高度,R1表示所述第一上顶点对应的第一底面圆的半径,以所述第一底面圆的圆心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系,k=0,1,2,…,n-1,所述n为正整数。
  8. 根据权利要求1-6任一项所述的方法,其特征在于,所述第一目标物体的三维模型的顶点集合,包括:
    一个第二上顶点(0,0,L*cos(π/3))、左顶点
    Figure PCTCN2021131569-appb-100002
    和右顶点
    Figure PCTCN2021131569-appb-100003
    Figure PCTCN2021131569-appb-100004
    其中,L表示边长,以底边的中心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系。
  9. 根据权利要求1-7任一项所述的方法,其特征在于,所述第一目标物体的三维模型的顶点集合,包括:
    第二底面圆上的m个等分点
    Figure PCTCN2021131569-appb-100005
    以及所述第二底面圆对应的顶面圆上的m个等分点
    Figure PCTCN2021131569-appb-100006
    其中,H2表示第二高度,R2表示所述第二底面圆的半径,k=0,1,2,…,m-1,m为正整数。
  10. 一种目标检测装置,其特征在于,包括:
    获取模块,用于获取第一图像;
    处理模块,用于对所述第一图像进行处理得到Q个感兴趣区域,确定每一个感兴趣区域中的参考点对应的图像坐标系的坐标;
    所述处理模块,用于根据所述每一个感兴趣区域中的参考点对应的图像坐标系的坐标,确定所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标;
    所述处理模块,用于确定第一目标物体的三维模型;
    所述处理模块,用于根据所述每一个感兴趣区域中的参考点对应的车体坐标系的坐标、所述第一目标物体的三维模型的顶点集合,确定所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标;
    所述处理模块,用于将所述第一目标物体的三维模型在所述车体坐标系对应的至少一个坐标在所述图像坐标系进行投影得到Q个像素区域;
    所述处理模块,用于根据所述至少一个感兴趣区域和所述Q个像素区域,确定所述目标的检测结果,其中,Q为正整数。
  11. 根据权利要求10所述的装置,其特征在于,
    所述处理模块,还用于对所述Q个像素区域对应的Q个感兴趣区域进行筛选处理得到R个感兴趣区域,所述每一个像素区域对应一个感兴趣区域,R为正整数且R小于等于Q;确定所述R个感兴趣区域对应的R个外接矩形;根据所述R个外接矩形,确定所述目标的检测结果。
  12. 根据权利要求11所述的装置,其特征在于,
    所述处理模块,还用于在所述Q个感兴趣区域中的第一感兴趣区域的面积占所述第一感兴趣区域的凸包络面积的比例大于第一预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络宽高比与所述第一感兴趣区域对应的像素区域的宽高比的比值满足大于第二预设值且小于第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络面积与所述第一感兴趣区域对应的像素区域的面积的比值满足大于所述第二预设值且小于所述第三预设值的条件的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个;在所述Q个感兴趣区域中的第一感兴趣区域的凸包络轮廓与所述第一感兴趣区域对应的像素区域的轮廓的交并比IOU大于第四预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
  13. 根据权利要求11所述的装置,其特征在于,
    所述处理模块,还用于使用预先训练好的分类器对所述Q个感兴趣区域中的第一感兴趣区域的外接矩形中的内容进行评价打分,获得第一分值;在所述第一分值高于第五预设值的情况下,将所述第一感兴趣区域作为所述R个感兴趣区域中的一个。
  14. 根据权利要求11-13任一项所述的装置,其特征在于,
    所述处理模块,还用于计算所述R个外接矩形的面积;根据所述R个外接矩形的面积确定所述目标的检测结果。
  15. 根据权利要求11-13任一项所述的装置,其特征在于,
    所述处理模块,还用于使用预先训练好的分类器对所述R个外接矩形中的内容进行评价打分,获得R个分值;根据所述R个分值确定所述目标的检测结果。
  16. 根据权利要求10-15任一项所述的装置,其特征在于,所述第一目标物体的三维模型的顶点集合,包括:
    一个第一上顶点(0,0,H1)、以及所述第一上顶点对应的第一底面圆上的n个等分点
    Figure PCTCN2021131569-appb-100007
    其中,H1表示第一高度,R1表示所述第一上顶点对应的第一底面圆的半径,以所述第一底面圆的圆心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系,k=0,1,2,…,n-1,所述n为正整数。
  17. 根据权利要求10-15任一项所述的装置,其特征在于,所述第一目标物体的三维模型的顶点集合,包括:
    一个第二上顶点(0,0,L*cos(π/3))、左顶点
    Figure PCTCN2021131569-appb-100008
    和右顶点
    Figure PCTCN2021131569-appb-100009
    Figure PCTCN2021131569-appb-100010
    其中,L表示边长,以底边的中心为原点(0,0,0),坐标轴为X轴向前、Y轴向左、Z轴向上定义的三维模型坐标系。
  18. 根据权利要求10-15任一项所述的装置,其特征在于,所述第一目标物体的三维模型的顶点集合,包括:
    第二底面圆上的m个等分点
    Figure PCTCN2021131569-appb-100011
    以及所述第二底面圆对应的顶面圆上的m个等分点
    Figure PCTCN2021131569-appb-100012
    其中,H2表示第二高度,R2表示所述第二底面圆的半径,k=0,1,2,…,m-1,m为正整数。
  19. 一种目标检测装置,其特征在于,包括:处理器和存储器;所述存储器用于存储一个或多个程序,所述一个或多个程序包括计算机执行指令,当所述装置运行时,所述处理器执行所述存储器存储的所述一个或多个程序以使所述装置执行如权利要求1-9任一项所述的方法。
  20. 一种芯片系统,其特征在于,所述芯片系统包括至少一个处理器和获取接口,所述至少一个处理器用于从所述获取接口调用计算机程序,当所述处理器执行所述指令时,以使得所述芯片系统所在装置实现如权利要求1-9中任一项所述的方法。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1-9中任一项所述的方法。
  22. 一种车辆,其特征在于,所述车辆包括如权利要求10至19中任一项所述的目标检测装置。
PCT/CN2021/131569 2021-01-08 2021-11-18 一种目标检测方法及装置 WO2022148143A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110026498.9A CN114792416A (zh) 2021-01-08 2021-01-08 一种目标检测方法及装置
CN202110026498.9 2021-01-08

Publications (1)

Publication Number Publication Date
WO2022148143A1 true WO2022148143A1 (zh) 2022-07-14

Family

ID=82357830

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131569 WO2022148143A1 (zh) 2021-01-08 2021-11-18 一种目标检测方法及装置

Country Status (2)

Country Link
CN (1) CN114792416A (zh)
WO (1) WO2022148143A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511807A (zh) * 2022-09-16 2022-12-23 北京远舢智能科技有限公司 一种凹槽位置和深度的确定方法及装置
US20230215026A1 (en) * 2022-01-03 2023-07-06 GM Global Technology Operations LLC On-vehicle spatial monitoring system
CN117934804A (zh) * 2024-03-15 2024-04-26 深圳市森美协尔科技有限公司 确定晶圆是否扎针合格的方法及相关装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147616B (zh) * 2022-07-27 2024-08-20 安徽清洛数字科技有限公司 一种基于车辆轮胎关键点的路面积水深度检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067557A (zh) * 2007-07-03 2007-11-07 北京控制工程研究所 适用于自主移动车辆的环境感知的单目视觉导航方法
CN107886043A (zh) * 2017-07-20 2018-04-06 吉林大学 视觉感知的汽车前视车辆和行人防碰撞预警系统及方法
US20190072971A1 (en) * 2017-09-01 2019-03-07 Honda Motor Co., Ltd. Vehicle control device, vehicle control method, and storage medium
CN111414857A (zh) * 2020-03-20 2020-07-14 辽宁工业大学 一种基于视觉的多特征融合的前方车辆检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067557A (zh) * 2007-07-03 2007-11-07 北京控制工程研究所 适用于自主移动车辆的环境感知的单目视觉导航方法
CN107886043A (zh) * 2017-07-20 2018-04-06 吉林大学 视觉感知的汽车前视车辆和行人防碰撞预警系统及方法
US20190072971A1 (en) * 2017-09-01 2019-03-07 Honda Motor Co., Ltd. Vehicle control device, vehicle control method, and storage medium
CN111414857A (zh) * 2020-03-20 2020-07-14 辽宁工业大学 一种基于视觉的多特征融合的前方车辆检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN XUEWEN , CHEN HUAQING , PEI YUEYING: "Front Vehicle Detection Method in Advanced Driver Assistance System Based on Multi-Feature Fusion", JOURNAL OF COMPUTER APPLICATIONS, vol. 40, no. S1, 10 July 2020 (2020-07-10), pages 185 - 188, XP055949345, ISSN: 1001-9081, DOI: 10.11772/J.ISSN.1001-9081.2019122158 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230215026A1 (en) * 2022-01-03 2023-07-06 GM Global Technology Operations LLC On-vehicle spatial monitoring system
US12086996B2 (en) * 2022-01-03 2024-09-10 GM Global Technology Operations LLC On-vehicle spatial monitoring system
CN115511807A (zh) * 2022-09-16 2022-12-23 北京远舢智能科技有限公司 一种凹槽位置和深度的确定方法及装置
CN115511807B (zh) * 2022-09-16 2023-07-28 北京远舢智能科技有限公司 一种凹槽位置和深度的确定方法及装置
CN117934804A (zh) * 2024-03-15 2024-04-26 深圳市森美协尔科技有限公司 确定晶圆是否扎针合格的方法及相关装置
CN117934804B (zh) * 2024-03-15 2024-06-04 深圳市森美协尔科技有限公司 确定晶圆是否扎针合格的方法及相关装置

Also Published As

Publication number Publication date
CN114792416A (zh) 2022-07-26

Similar Documents

Publication Publication Date Title
WO2022148143A1 (zh) 一种目标检测方法及装置
CN109948661B (zh) 一种基于多传感器融合的3d车辆检测方法
CN112613378B (zh) 3d目标检测方法、系统、介质及终端
CN113111887B (zh) 一种基于相机和激光雷达信息融合的语义分割方法及系统
CN111080693A (zh) 一种基于YOLOv3的机器人自主分类抓取方法
CN112287860B (zh) 物体识别模型的训练方法及装置、物体识别方法及系统
CN115082674B (zh) 基于注意力机制的多模态数据融合三维目标检测方法
CN113192091B (zh) 一种基于激光雷达与相机融合的远距离目标感知方法
CN113506318B (zh) 一种车载边缘场景下的三维目标感知方法
CN114359181B (zh) 一种基于图像和点云的智慧交通目标融合检测方法及系统
CN112825192B (zh) 基于机器学习的对象辨识系统及其方法
CN113408324A (zh) 目标检测方法、装置及系统、高级驾驶辅助系统
CN111259958B (zh) 物体识别方法及装置、存储介质
CN115019043B (zh) 基于交叉注意力机制的图像点云融合三维目标检测方法
CN114495026A (zh) 一种激光雷达识别方法、装置、电子设备和存储介质
CN114463736A (zh) 一种基于多模态信息融合的多目标检测方法及装置
Dow et al. A crosswalk pedestrian recognition system by using deep learning and zebra‐crossing recognition techniques
CN114639115B (zh) 一种人体关键点与激光雷达融合的3d行人检测方法
CN112287859A (zh) 物体识别方法、装置和系统,计算机可读存储介质
CN114088099A (zh) 基于已知地图的语义重定位方法、装置、电子设备及介质
CN110909656B (zh) 一种雷达与摄像机融合的行人检测方法和系统
CN117111055A (zh) 一种基于雷视融合的车辆状态感知方法
CN115100741A (zh) 一种点云行人距离风险检测方法、系统、设备和介质
CN114255443A (zh) 一种交通车辆单目定位方法、装置、设备及存储介质
CN110197104B (zh) 基于车辆的测距方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21917184

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21917184

Country of ref document: EP

Kind code of ref document: A1