CN114792416A

CN114792416A - Target detection method and device

Info

Publication number: CN114792416A
Application number: CN202110026498.9A
Authority: CN
Inventors: 云一宵; 郑迪威; 马志贤; 苏惠荞
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2022-07-26
Also published as: WO2022148143A1

Abstract

The embodiment of the application provides a target detection method and a target detection device, wherein the method comprises the following steps: acquiring a first image; then processing the first image to obtain Q interested areas, and determining the coordinates of an image coordinate system corresponding to the reference point in each interested area; determining the coordinates of the vehicle body coordinate system corresponding to the reference points according to the coordinates of the image coordinate system corresponding to the reference points; determining a three-dimensional model of a first target object; determining at least one coordinate of the three-dimensional model corresponding to the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point and the vertex set of the three-dimensional model; projecting the at least one coordinate in an image coordinate system to obtain Q pixel areas; according to the Q interesting areas and the Q pixel areas, the detection result of the target is determined, by adopting the embodiment of the application, the obstacles in a traffic scene can be detected in real time, and the capability of a high-grade driving auxiliary system in automatic driving or auxiliary driving is improved.

Description

Target detection method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a target detection method and apparatus.

Background

With the development of society, intelligent terminals such as intelligent transportation equipment, intelligent home equipment, and robots are gradually entering the daily lives of people. The sensor plays an important role in the intelligent terminal. Various sensors installed on the intelligent terminal, such as a millimeter wave radar, a laser radar, a camera, an ultrasonic radar and the like, sense the surrounding environment in the motion process of the intelligent terminal, collect data, identify, track, measure speed and distance moving objects, identify and position static scenes such as lane lines and traffic scene objects, and perform path planning and other behavior control by combining with a navigator and map data.

In a typical traffic scene, there may be some objects occupying a drivable road surface and affecting the forward movement of a vehicle, as shown in fig. 1, for example, a traffic cone, a triangular warning sign for a motor vehicle, a flat tire, etc., how to detect these objects in real time, and providing important information for subsequent path planning is a technical problem that needs to be solved urgently.

Disclosure of Invention

The embodiment of the application discloses a target detection method and device, which can detect barrier objects in a traffic scene in real time, provide important information for subsequent path planning, and improve the capability of a high-grade driving assistance system in automatic driving or assisted driving.

A first aspect of an embodiment of the present application discloses a target detection method, including: acquiring a first image; processing the first image to obtain Q interested areas, and determining the coordinates of an image coordinate system corresponding to the reference point in each interested area; wherein Q is a positive integer; determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested area according to the coordinates of the image coordinate system corresponding to the reference point in each interested area; determining a three-dimensional model of a first target object; determining at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object; projecting at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system in the image coordinate system to obtain Q pixel areas; and determining the detection result of the target according to the Q interested areas and the Q pixel areas.

According to the method, traffic scene barrier objects such as traffic cones, triangular warning signs for automobiles, tires and the like are specified by specific parameters in international or national standards, so that a three-dimensional model coordinate system can be defined by taking the traffic scene barrier objects as a reference, and a three-dimensional model of the first target object is obtained. According to the embodiment of the application, the three-dimensional model of the first target object is placed on the vehicle body coordinates corresponding to the reference points in each region of interest, so that the coordinates of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system are obtained, and in such a way, even if the shape, size and dimension of the traffic scene obstacle object change to a certain extent, the traffic scene obstacle object can still be detected as long as the overall shape is not seriously changed. In addition, the reference point of the region of interest is converted into the vehicle body coordinate system from the image coordinate system, then the three-dimensional model of the first target object is placed on the vehicle body coordinate system corresponding to the reference point, the coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system is determined, and then the coordinate of the three-dimensional model in the vehicle body coordinate system is subjected to projection matching in the image coordinate system, namely the imaging characteristic of the first target object is processed and analyzed, so that the detection purpose is achieved, a large number of training samples acquired and labeled in advance do not need to be obtained for training, and the calculation complexity is low.

In a possible implementation manner, the determining a detection result of the target according to the Q regions of interest and the Q pixel regions includes: screening Q interested areas corresponding to the Q pixel areas to obtain R interested areas, wherein each pixel area corresponds to one interested area, R is a positive integer, and R is less than or equal to Q; determining R circumscribed rectangles corresponding to the R interested areas; and determining the detection result of the target according to the R circumscribed rectangles.

In the method, the Q interesting regions are screened to obtain R interesting regions, image noise points can be effectively removed, the accuracy of a target detection result is improved, the detection result of the target can be determined according to the R external rectangles, the R external rectangles are processed by adopting a non-maximum suppression algorithm, redundant external rectangles are removed, the detection result of the target is finally determined, and the target detection efficiency is accelerated.

In another possible implementation manner, the obtaining R regions of interest by performing the screening process on the Q regions of interest corresponding to the Q pixel regions includes at least one of the following manners: when the proportion of the area of a first interested area in the Q interested areas to the convex envelope area of the first interested area is larger than a first preset value, taking the first interested area as one of the R interested areas; when the ratio of the convex hull aspect ratio of a first interested area in the Q interested areas to the aspect ratio of a pixel area corresponding to the first interested area meets the condition that the ratio is greater than a second preset value and smaller than a third preset value, taking the first interested area as one of the R interested areas; when the ratio of the convex envelope area of the first interested area in the Q interested areas to the area of the pixel area corresponding to the first interested area meets the condition that the ratio is greater than the second preset value and less than the third preset value, taking the first interested area as one of the R interested areas; and when the intersection ratio IOU of the convex envelope contour of the first interested region in the Q interested regions and the contour of the pixel region corresponding to the first interested region is larger than a fourth preset value, taking the first interested region as one of the R interested regions.

In another possible implementation manner, the screening Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest includes the following manners: evaluating and scoring the content in a circumscribed rectangle of a first interested region in the Q interested regions by using a pre-trained classifier to obtain a first score; and when the first score is higher than a fifth preset value, taking the first interested area as one of the R interested areas.

In another possible implementation manner, the determining, according to the R circumscribed rectangles, a detection result of the target includes: calculating the areas of the R circumscribed rectangles; and determining the detection result of the target according to the areas of the R circumscribed rectangles.

In the method, the detection result of the target can be rapidly determined through the areas of the R circumscribed rectangles, and the target detection efficiency is improved.

In another possible implementation manner, the determining, according to the R circumscribed rectangles, a detection result of the target includes: evaluating and scoring the contents in the R circumscribed rectangles by using a pre-trained classifier to obtain R scores; and determining the detection result of the target according to the R scores.

A second aspect of the embodiments of the present application discloses a target detection apparatus, including: the acquisition module is used for acquiring a first image; the processing module is used for processing the first image to obtain Q interested areas and determining the coordinates of an image coordinate system corresponding to the reference point in each interested area; wherein Q is a positive integer; the processing module is used for determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested region according to the coordinates of the image coordinate system corresponding to the reference point in each interested region; the processing module is used for determining a three-dimensional model of the first target object; the processing module is configured to determine at least one coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object; the processing module is used for projecting at least one coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system in the image coordinate system to obtain Q pixel areas; the processing module is configured to determine a detection result of the target according to the at least one region of interest and the Q pixel regions.

In a possible implementation manner, the processing module is further configured to perform screening processing on Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest, where each pixel region corresponds to one region of interest, R is a positive integer and R is less than or equal to Q; determining R circumscribed rectangles corresponding to the R interesting regions; and determining the detection result of the target according to the R circumscribed rectangles.

In yet another possible implementation manner, the processing module is further configured to, when a ratio of an area of a first region of interest of the Q regions of interest to a convex envelope area of the first region of interest is greater than a first preset value, use the first region of interest as one of the R regions of interest; taking a first interested area of the Q interested areas as one of the R interested areas under the condition that the ratio of the convex hull aspect ratio of the first interested area to the aspect ratio of the pixel area corresponding to the first interested area meets the condition that the ratio is more than a second preset value and less than a third preset value; taking the first interested area as one of the R interested areas under the condition that the ratio of the convex envelope area of the first interested area in the Q interested areas to the area of the pixel area corresponding to the first interested area meets the condition that the ratio is more than the second preset value and less than the third preset value; and under the condition that the intersection ratio IOU of the convex envelope contour of the first interested region in the Q interested regions and the contour of the pixel region corresponding to the first interested region is larger than a fourth preset value, taking the first interested region as one of the R interested regions.

In another possible implementation manner, the processing module is further configured to use a pre-trained classifier to evaluate and score content in a circumscribed rectangle of a first region of interest of the Q regions of interest, so as to obtain a first score; and taking the first interested region as one of the R interested regions under the condition that the first score is higher than a fifth preset value.

In yet another possible implementation manner, the processing module is further configured to calculate areas of the R circumscribed rectangles; and determining the detection result of the target according to the areas of the R circumscribed rectangles.

In another possible implementation manner, the processing module is further configured to evaluate and score the contents in the R circumscribed rectangles by using a pre-trained classifier, so as to obtain R scores; and determining the detection result of the target according to the R scores.

With regard to the technical effects brought about by the second aspect or a possible implementation, reference may be made to the introduction to the technical effects of the first aspect or a corresponding implementation.

A third aspect of the embodiments of the present application discloses a target detection apparatus, including: a processor and memory, the memory for storing one or more programs, the one or more programs comprising computer executable instructions, the processor for invoking the one or more programs stored by the memory to perform the following: acquiring a first image; processing the first image to obtain Q interested areas, and determining the coordinates of an image coordinate system corresponding to the reference point in each interested area; determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested region according to the coordinates of the image coordinate system corresponding to the reference point in each interested region; determining a three-dimensional model of a first target object; determining at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object; projecting at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system in the image coordinate system to obtain Q pixel areas; and determining the detection result of the target according to the Q interested areas and the Q pixel areas, wherein Q is a positive integer.

In a possible implementation manner, the at least one processor is further configured to perform screening processing on Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest, where each pixel region corresponds to one region of interest, R is a positive integer, and R is less than or equal to Q; determining R circumscribed rectangles corresponding to the R interested areas; and determining the detection result of the target according to the R circumscribed rectangles.

In yet another possible implementation manner, the at least one processor is further configured to, when a ratio of an area of a first region of interest of the Q regions of interest to a convex envelope area of the first region of interest is greater than a first preset value, regard the first region of interest as one of the R regions of interest; taking a first interested area of the Q interested areas as one of the R interested areas under the condition that the ratio of the convex hull aspect ratio of the first interested area to the aspect ratio of the pixel area corresponding to the first interested area meets the condition that the ratio is more than a second preset value and less than a third preset value; taking the first interested area as one of the R interested areas under the condition that the ratio of the convex envelope area of the first interested area in the Q interested areas to the area of the pixel area corresponding to the first interested area meets the condition that the ratio is more than the second preset value and less than the third preset value; and under the condition that the intersection ratio IOU of the convex envelope contour of the first interested region in the Q interested regions and the contour of the pixel region corresponding to the first interested region is larger than a fourth preset value, taking the first interested region as one of the R interested regions.

In yet another possible implementation manner, the at least one processor is further configured to use a pre-trained classifier to evaluate and score content in a circumscribed rectangle of a first region of interest of the Q regions of interest, so as to obtain a first score; and when the first score is higher than a fifth preset value, taking the first interested area as one of the R interested areas.

In yet another possible implementation manner, the at least one processor is further configured to calculate areas of the R bounding rectangles; and determining the detection result of the target according to the areas of the R circumscribed rectangles.

In yet another possible implementation manner, the at least one processor is further configured to use a pre-trained classifier to evaluate and score the contents in the R bounding rectangles, so as to obtain R scores; and determining the detection result of the target according to the R scores.

With regard to technical effects brought about by the third aspect or a possible implementation, reference may be made to the introduction of the technical effects of the first aspect or the corresponding implementation.

With reference to any one of the above aspects or any one of the possible implementation manners of any one of the above aspects, in a further possible implementation manner, the set of vertices of the three-dimensional model of the first target object includes: a first upper vertex (0,0, H1), and n bisectors on the first base circle corresponding to said first upper vertex

Wherein H1 represents a first height, and R1 represents the first upper vertex pairThe radius of the first base circle is defined by taking the center of the first base circle as an origin (0,0,0), the coordinate axis is a three-dimensional model coordinate system defined by the forward X-axis, the leftward Y-axis and the axial Z-axis, k is 0,1,2, …, n-1, and n is a positive integer.

With reference to any one of the above aspects or any one of possible implementation manners of any one of the above aspects, in yet another possible implementation manner, the set of vertices of the three-dimensional model of the first target object includes: a second top vertex (0,0, Lxcos (π/3)), a left vertex

And the right vertex

Wherein, L represents the side length, and takes the center of the bottom edge as the origin (0,0,0), and the coordinate axis is a three-dimensional model coordinate system defined by the forward X-axis, the leftward Y-axis and the axial Z-axis.

With reference to any one of the above aspects or any one of possible implementation manners of any one of the above aspects, in yet another possible implementation manner, the set of vertices of the three-dimensional model of the first target object includes: m equal division points on the second bottom circle

And m equally divided points on the top surface circle corresponding to the second bottom surface circle

Wherein H2 denotes the second height, R2 denotes the radius of the second base circle, k is 0,1,2, …, m-1, and m is a positive integer.

A fourth aspect of the embodiments of the present application discloses a chip system, where the chip system includes at least one processor and an acquisition interface, and the at least one processor is configured to invoke a computer program from the acquisition interface, so as to implement the method described in any one of the above aspects or any possible implementation manner of any one of the above aspects.

A fifth aspect of embodiments of the present application discloses a computer-readable storage medium, where a computer program is stored, and when the computer program runs on a computer, the method described in any one of the above aspects or in any possible implementation manner of any one of the above aspects is implemented.

A sixth aspect of embodiments of the present application discloses a vehicle including the object detection device of the second aspect or the object detection device of the third aspect.

Drawings

The drawings used in the embodiments of the present application are described below.

FIG. 1 is a schematic diagram of a target obstacle in a traffic scene according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an object detection system according to an embodiment of the present application;

FIG. 3 is an exemplary diagram of a traffic scene provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of a target detection method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an image coordinate system provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a first image provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a further first image provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of a further first image provided by an embodiment of the present application;

fig. 9 is a schematic diagram of a region of interest obtained by processing a first image according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a region of interest obtained by processing a first image according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a region of interest obtained by processing a first image according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a back projection process provided by an embodiment of the present application;

fig. 13 is a schematic diagram of a camera coordinate system and an image coordinate system provided in an embodiment of the present application;

fig. 14 is a corresponding relationship between a position of a coordinate of an image coordinate system corresponding to a reference point and a position of a coordinate of a vehicle body coordinate system corresponding to the reference point, provided in an embodiment of the present application;

FIG. 15 is a schematic illustration of a three-dimensional model of a first target object provided by an embodiment of the present application;

FIG. 16 is a schematic illustration of a three-dimensional model of yet another first target object provided by an embodiment of the present application;

FIG. 17 is a schematic illustration of a three-dimensional model of yet another first target object provided by an embodiment of the present application;

FIG. 18 is a process for placing a three-dimensional model of a first target object at a location corresponding to a coordinate of a vehicle body coordinate system corresponding to a reference point according to an embodiment of the present disclosure;

FIG. 19 is a schematic diagram of a back projection provided by an embodiment of the application;

fig. 20 is a schematic diagram of a pixel region according to an embodiment of the present application;

fig. 21 is a schematic diagram of a pixel region in a first image according to an embodiment of the present disclosure;

fig. 22 is a schematic diagram of another pixel region provided in this embodiment of the present application;

FIG. 23 is a schematic diagram of yet another pixel region provided in an embodiment of the present application;

FIG. 24 is a schematic diagram of a circumscribed rectangle provided in an embodiment of the present application;

FIG. 25 is a diagram illustrating a detection result of a target according to an embodiment of the present application;

FIG. 26 is a diagram illustrating a detection result of a target according to an embodiment of the present disclosure;

FIG. 27 is a diagram illustrating a detection result of a target according to an embodiment of the present application;

FIG. 28 is a schematic view of an object detection device according to an embodiment of the present application;

fig. 29 is a schematic diagram of an apparatus for detecting an object according to an embodiment of the present application.

Detailed Description

The embodiments of the present application are described below with reference to the drawings.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an object detection system 2000 provided in an embodiment of the present application, where the system includes an obtaining module 2001, a processing module 2002, and a planning and control module 2003, where the obtaining module 2001 is configured to obtain an image to be detected; the processing module 2002 is configured to detect an obstacle in the to-be-detected image acquired by the acquisition module 2001; the planning and control unit 2003 is arranged to receive the output of the processing module 2002 and to plan and control the behaviour of the movable platform itself. The system 2000 may be applied to a movable platform, such as a vehicle, a robot, etc.

Some terms in the present application are explained below to facilitate understanding.

Three-dimensional projection (3-Dimension projection): refers to the process of mapping points in three-dimensional space onto a two-dimensional plane. In the field of computer vision, three-dimensional projection mainly refers to a process of mapping points in a world space, which may be a vehicle body coordinate system, to a two-dimensional image plane through a camera model (e.g., a pinhole model).

Back-projection (back-projection): the inverse process of three-dimensional projection refers to the process of mapping points in a two-dimensional plane into three-dimensional space. In the field of computer vision, backprojection mainly refers to the process of mapping points in a two-dimensional image plane, through a camera model and some geometric constraints (e.g., ideal ground plane assumptions), into the world space, which may be the vehicle body coordinate system.

Ideal ground plane assumption (flat-earth assumption): the road surface on which the vehicle travels is considered to be an ideal plane. Based on the assumption, the back projection can be realized, namely, an ideal ground plane in world space, namely, a point corresponding to a plane defined on a vehicle body coordinate system, is found from pixel points belonging to the road surface in a two-dimensional image plane.

Convex envelope (convex hull): given a set of points, the convex hull of the set of points is the convex polygon containing the minimum area of all points in the set of points. Intuitively, a convex polygon (convex polygon) is a polygon without any recessed bits.

Non-maximum suppression (NMS): the method is an algorithm for searching local maximum and removing non-maximum, and is commonly used for post-processing of a detection frame in target detection. The input to the algorithm is a set of candidate boxes and a score (score) for each candidate box, and the output is a subset of the candidate boxes. The method comprises the following specific steps: firstly, setting all frames not to be inhibited, and sorting all frames from large to small according to scores; traversing from the box with the highest score, and for each box, if the box is not suppressed, setting all boxes with the coincidence degree of the boxes larger than the threshold value as suppressed; finally, the box that is not suppressed is returned.

Intersection-over-Union (IoU): the IOU is a concept used in target detection, and is an overlapping ratio of two regions; briefly, that is, the ratio of the intersection and union of the areas of the two regions, taking the two regions as the first region and the second region as an example, the calculation formula of the intersection ratio IOU between the first region and the second region is as follows:

wherein S is _{area(c)∩area(G)} Is the area of intersection between the first region C and the second region G, S _{area(C)∩area(G)} Is the area of the union between the first region C and the second region G. In an example, the intersection ratio IOU of the convex envelope contour of the first region of interest of the Q regions of interest and the contour of the pixel region corresponding to the first region of interest is greater than a fourth preset value, and the first region of interest is taken as one of the R regions of interest, where the fourth preset value is 70%, then the region included by the convex envelope contour of the first region of interest of the Q regions of interest is the first region, and the region included by the contour of the pixel region corresponding to the first region of interest may be the second region.

Clustering: refers to the process of dividing a collection of physical or abstract objects into classes composed of similar objects. The clusters generated by clustering are a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters.

The related technologies for detecting a target obstacle in real time in a traffic scene are mainly classified into two types: the first type is: the target obstacle is detected by sensors such as millimeter wave radar and laser radar and based on physical or geometric principles, such information as distance, speed and azimuth angle of surrounding objects is measured, or three-dimensional point cloud or depth information of surrounding environment is formed. The second type: the visible light images collected by the camera are used for identifying the objects in the picture based on processing analysis and learning of the imaging characteristics of the target object, so that the aim of detecting the target obstacles in the images is fulfilled.

In one method, taking a traffic scene, a target obstacle is a traffic cone as an example: based on supervised learning, training by using a training sample of the traffic cone acquired and labeled in advance to obtain a detection and identification model, and then detecting and identifying a candidate target area in a picture to achieve the purpose of detecting the traffic cone. The specific process is as follows: firstly, acquiring a compression and activation network SENEt and a dense convolution network DenseNet, then determining a target network structure based on the SENEt and the DenseNet and a preset target detection model, namely designing a cascade network structure, and then training the target network structure to obtain a traffic cone recognition model based on a plurality of original traffic cone scene images containing traffic cones; and then inputting the image to be recognized into the traffic cone recognition model, and outputting a recognition result, wherein the recognition result is that a traffic cone exists in the image to be recognized and the position of the traffic cone in the image to be recognized or no traffic cone exists in the image to be recognized. However, detecting a target obstacle in this way has the following disadvantages: the algorithm complexity of the cascade network framework is high, the requirement on computing power is high, and if the computing power of a computing platform is limited, deployment or real-time effective detection is difficult to form. Moreover, the traditional target detection framework based on supervised learning needs to train models, and the number and the respective dependence degree of the models are higher, so that data acquisition and labeling in a certain scale need to be performed on different types of target obstacles, and when the number and the distribution of training samples are insufficient, effective real-time detection is difficult to form.

In another method, the objective of detecting a target obstacle is achieved by clustering optical flows in a scene on the premise that optical flows belonging to the same foreground object converge and are significantly different from optical flows of a background under normal conditions. The specific process is as follows: first, the optical flow between adjacent frames is calculated. Then, based on the optical flow field, clustering the pixel points with mutually close positions and similar displacement vectors. The clustering criterion is then: whether there is a common optical flow collection point (focus of expansion) and a common scale (scale map). These clustered optical-flow clusters are then output as foreground target areas, i.e., target obstacle detection results. As shown at 3-1 in fig. 3, an example of a traffic scene is shown, including foreground objects including pedestrians (320) and automobiles (310), and static backgrounds (e.g., ground, roads, etc.); the corresponding feature space (composed of three dimensions of the abscissa X of the optical flow collection point, the abscissa Y of the optical flow collection point, and the optical flow vector scale) is given as 3-2 in fig. 3, and includes the feature point (350) corresponding to the pedestrian, the feature point (340) corresponding to the car, and the feature subspace (330) corresponding to the static background. Optical flows generated by pedestrians crossing a road have a common collection point and the same scale (S2), and thus a feature point (350) is formed in the feature space. The optical flows of the static background have a common collection point, but have a wide scale distribution, so the corresponding feature subspace (330) is cylindrical instead of point-shaped. Since the optical flow collection points of the pedestrians are obviously different from the optical flow collection points of the static background, the feature points (350) corresponding to the pedestrians and the feature subspace (330) corresponding to the static background have larger distances in the feature space, and are easier to distinguish. That is, by clustering the optical flows in the above-described feature space, the pedestrian in the example can be effectively detected. However, the following disadvantages exist in this way: based on the global dense optical flow, the calculation cost is high, and challenges are caused to the real-time performance of the algorithm; the first premise that an object can be detected is that a corresponding optical flow exists, and if optical flow calculation fails due to common reasons of single or repeated object textures, motion blur, overlarge inter-frame displacement beyond a search range, relative rest of a vehicle and the like, detection cannot be performed; the detection performance depends on the precision of the optical flow, and if the estimation on the optical flow collection points and the scale is not accurate enough, effective clustering cannot be formed; for an object that is moving straight ahead of the vehicle, either stationary or parallel to the direction of motion of the vehicle, effective detection cannot be made. As shown in fig. 3, the optical flows generated by the front cars (310) have a common collection point and the same dimension (S1), and thus a feature point (340) is formed in the feature space. However, since the vehicle is in the above-mentioned motion state (stationary or parallel to the vehicle motion direction), the optical flow collection point is exactly the same as that of the static background, resulting in that in the feature space, the corresponding feature point (340) is contained by the feature subspace (330) corresponding to the static background. That is, by clustering the optical flows in the above feature space, the car in the example cannot be detected efficiently.

Based on this, the embodiments of the present application propose the following solutions.

The method is performed by a movable platform, which may be a vehicle, a robot, or the like.

Referring to fig. 4, fig. 4 is a diagram illustrating a target detection method according to an embodiment of the present application, where the method includes, but is not limited to, the following steps:

step S401: a first image is acquired.

Specifically, the manner of acquiring the first image may include two ways: when the method is applied to a chip of a camera, the first image is obtained and is an image shot by the camera; when the method of the embodiment of the application is applied to a chip other than a camera, acquiring the first image refers to receiving an image from the camera, and the camera may be a monocular camera, a binocular camera, a multi-view camera, a panoramic camera, or the like, which is not limited herein.

Step S402: and processing the first image to obtain Q interested areas, and determining the coordinates of an image coordinate system corresponding to the reference point in each interested area.

Specifically, Q is a positive integer, and Q regions of interest may be obtained by processing the first image in a manner of defining a color range of the first image or extracting an edge. Specifically, the process of processing the first image by limiting the color range of the first image is as follows: converting the color space of the first image into Hue, Saturation and brightness values (Hue, Saturation, Value, HSV), and then limiting the range of three dimensions of Hue, Saturation and brightness values to obtain Q regions of interest. Specifically, the process of processing the first image in a manner of extracting an edge to obtain Q regions of interest is as follows: in general, the pixel value distributions of the object and the road surface are different, so that an obvious edge characteristic caused by large pixel value change is often presented at the boundary of the object, and the first image is processed by an edge operator to obtain Q interested areas.

Specifically, the reference point in each region of interest may be a bottom midpoint in each region of interest, or may be other points in each region of interest, which is not limited herein, and a standard of uniform point extraction is adopted. The image coordinate system can be divided into an image pixel coordinate system and an image physical coordinate system, and the origin O of the image physical coordinate system ₁ The X axis and the Y axis are respectively parallel to the Xc axis and the Yc axis of a camera coordinate system and are plane rectangular coordinate systems; the image pixel coordinate system is a rectangular plane coordinate system fixed on the image and having its origin O ₀ The u axis and the v axis are respectively parallel to the X axis and the Y axis of a physical coordinate system of the image, and the coordinate of the principal point in the u-v coordinate system is (u) ₀ ，v ₀ ) As shown in fig. 5. The coordinates of a point on the image in the image physical coordinate system and the image pixel coordinate system can be converted to each other.

The specific transformation process is as follows, assuming that the coordinates of a point on the image in the physical coordinate system of the image are (x, y), the coordinates in the pixel coordinate system of the image are (u,v),(u ₀ ，v ₀ ) Is the pixel coordinate of the main point in the image pixel coordinate system. du, dv are the physical dimensions of a pixel in the X-axis and Y-axis, respectively, then

The above formula (1) can be expressed in a matrix multiplication form, specifically as follows:

specific examples of the Q regions of interest obtained by processing the first image are as follows:

in one example, a first image is shown in FIG. 6, which includes a traffic cone, the color space of the first image is converted into HSV, and the ranges of three dimensions of hue H, saturation S and brightness V are defined, in the embodiment of the present application, the orange-red subspace in the HSV color space is defined as 0 DEG ≦ H ≦ 10 DEG, 160 DEG ≦ H ≦ 180 DEG; s is more than or equal to 70 and less than or equal to 255; v is more than or equal to 100 and less than or equal to 255, thereby obtaining Q interested areas.

In yet another example, a first image is shown in FIG. 7, which includes a triangle warning sign for a motor vehicle, the color space of the first image is converted into HSV, and the ranges of three dimensions, hue H, saturation S, and brightness V, are defined, in the embodiment of the present application, the orange-red subspace in the HSV color space is defined as 0 DEG ≦ H ≦ 10 DEG, 160 DEG ≦ H ≦ 180 DEG; s is more than or equal to 70 and less than or equal to 255; v is more than or equal to 100 and less than or equal to 255, thereby obtaining Q interested areas.

In yet another example, a first image is shown in FIG. 8, which includes a lying tire, the color space of the first image is converted into HSV, and the three dimensions of hue H, saturation S and brightness V are defined, in this embodiment, the black subspace in the HSV color space is defined as 0 ≦ H ≦ 120 °; s is more than or equal to 0 and less than or equal to 100; v is more than or equal to 100 and less than or equal to 20, thereby obtaining Q interested areas.

Optionally, in a possible implementation manner, after the first image is processed to obtain one or more regions of interest, if the number of pixel points included in a certain region of interest of the one or more regions of interest is less than S, the region of interest is filtered.

Specifically, S is a positive integer.

In an example, assuming that the first image is shown in fig. 6 and S has a value of 5, the first image is processed to obtain Q +1 regions of interest, where the number of pixel points included in one of the Q +1 regions of interest is 2, then one region of interest with the number of the pixel points of 2 is filtered out, and finally the obtained Q regions of interest are shown in fig. 9.

In another example, assuming that the first image is shown in fig. 7, and the value of S is 5, the first image is processed to obtain Q +2 regions of interest, where the number of pixel points included in one region of interest of the Q +2 regions of interest is 2, and the number of pixel points included in one region of interest is 3, then the one region of interest with the number of pixel points of 2 and the one region of interest with the number of pixel points of 3 are filtered, and finally the obtained Q regions of interest are shown in fig. 10.

In yet another example, assuming that the first image is shown in fig. 8, and the value of S is 15, the first image is processed to obtain Q +1 regions of interest, where the number of pixel points included in one of the Q +1 regions of interest is 10, then one region of interest with the number of pixel points of 10 is filtered, and finally the Q regions of interest are shown in fig. 11.

Step S403: and determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested region according to the coordinates of the image coordinate system corresponding to the reference point in each interested region.

In particular, the process may be referred to simply as back-projection. As shown in FIG. 12, assuming that the reference point in each region of interest corresponds to a land object, the purpose of the back-projection is to find the land object assumed to existAnd coordinates of the corresponding vehicle body coordinate system. In the embodiment of the present application, the origin O of the vehicle body coordinate system ₃ Is the projection of the central point of the rear axle of the bicycle on an ideal ground plane, namely a plane defined in a coordinate system of the bicycle body, and the coordinate axis is X _w Axially forward, Y _w Axial left, Z _w Axially.

Specifically, according to the external parameters, the internal reference matrix and the scale parameters of the camera, the coordinates of the image coordinate system corresponding to the reference point in each region of interest are determined, and the coordinates of the vehicle coordinate system corresponding to the reference point in each region of interest are determined, that is, the reference point in each region of interest is converted from the image coordinate system to the vehicle coordinate system, where the specific conversion relationship is as follows:

firstly, normalizing a reference point in each region of interest in an image coordinate system: the specific formula is as follows:

E _norm ＝K ^-1 e (3)

wherein E is _norm And the coordinates of the image coordinate system corresponding to the reference point in each interested region in the normalized image coordinate system are expressed, K is an internal reference matrix of the camera, and e is the coordinates of the image coordinate system corresponding to the reference point in each interested region.

Then, as shown in fig. 13, from the camera origin O _c Starting from the reference points in each region of interest in the image coordinate system, the expression Ray (t) in the camera coordinate system is obtained ₁ )＝(x _i t ₁ ，y _i t ₁ ，t ₁ ) (ii) a Wherein x is _i ，y _i Abscissa and ordinate, t, representing the image coordinate system corresponding to the reference point in each region of interest ₁ The coefficients are represented.

Defining an ideal ground plane in the vehicle body coordinate system due to the origin O of the vehicle body coordinate system ₃ Is the projection of the central point of the rear axle of the bicycle on an ideal ground plane, and the coordinate axis is X _w Axial forward, Y _w Axial left, Z _w In the axial direction, the ideal ground plane in the body coordinate system can therefore be defined by the normal vector n ═ 0,0,1]And the origin of the vehicle body coordinate systemO ₃ (0,0,0) determining a transformation matrix from the vehicle body coordinate system to the camera coordinate system, and comparing the normal vector n with the origin O of the vehicle body coordinate system ₃ And converting the vehicle body coordinate system into a camera coordinate system, and obtaining a corresponding expression Ax + By + Cz + D of the ideal ground plane in the camera coordinate system as 0. Wherein A, B, C, D are known constants, and A, B, C are not zero at the same time.

Obtaining the expression of the ray and the corresponding expression of the ideal ground plane in the camera coordinate system

Then t is ₁ And substituting an expression of the ray to obtain an intersection point of the ray and the plane, wherein the intersection point is a point of the reference point in each interested area in the image coordinate system, which corresponds to the camera coordinate system, and then obtaining a point of the reference point in the vehicle body coordinate system through a transformation matrix from the camera coordinate system to the vehicle body coordinate system, namely determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested area according to the coordinates of the image coordinate system corresponding to the reference point in each interested area through the process.

In one example, as shown in fig. 14, assuming that the positions of the coordinates of the image coordinate system corresponding to the reference point in each region of interest are 1001, 1002, 1003, 1004, 1005 and 1006, respectively, the positions of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest are 1007, 1008, 1009, 1010, 1011 and 1012, respectively, determined by the internal reference matrix and the scale parameter according to the external parameters of the camera. Specifically, the correspondence between the position of the coordinate of the image coordinate system corresponding to the reference point in each region of interest and the position of the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest is shown in table 1.

Table 1

Step S404: a three-dimensional model of the first target object is determined.

Specifically, the three-dimensional model of the first target object may be obtained by three-dimensional modeling of a traffic cone, a triangular warning sign for a motor vehicle, a flat tire, and the like, and is not limited herein. Since specific parameter information of the traffic cone, the triangular warning board for the motor vehicle and the tire is clearly specified in international or international standards, a three-dimensional model coordinate system can be defined by taking the traffic cone, the triangular warning board for the motor vehicle and the tire as reference objects, so as to determine a vertex set of the three-dimensional model of the first target object.

In one example, as shown in fig. 15, the traffic cone may be represented as a cone (201) with a radius R1 of 0.15 m at the bottom surface and a height H1 of 0.7 m, assuming that the three-dimensional model of the first target object is obtained by three-dimensional modeling of the traffic cone (101), as follows: defining a three-dimensional model coordinate system, using the center of the first base circle as the origin (0,0,0), and the coordinate axes are forward along the X-axis, leftward along the Y-axis, and upward along the Z-axis, so that the vertex set of the three-dimensional model of the first target object includes a first upper vertex (0,0, H1), and n bisectors on the first base circle corresponding to the first upper vertex

Where H1 denotes a height, R1 denotes a radius of the first base circle, k is 0,1,2, …, n-1, and n is a positive integer. In the embodiment of the present application, n is 36.

In yet another example, as shown in fig. 16, the triangle warning board model for the motor vehicle can be represented as an equilateral triangle (203) with a side length L of 0.5 m, and the three-dimensional model of the first target object is assumed to be the triangle warning board for the motor vehicle (103)) The three-dimensional modeling is carried out to obtain the following concrete results: defining a three-dimensional model coordinate system, using the center of the bottom edge as the origin (0,0,0), and the coordinate axes are forward of the X axis, leftward of the Y axis, and upward of the Z axis, so that the vertex set of the three-dimensional model of the first target object includes a second upper vertex (0,0, L × cos (π/3)), and a left vertex

And the right vertex

Wherein L represents a side length.

In yet another example, as shown in fig. 17, the flat-bed tire may be represented as a cylinder (205) with a base radius R2 of 0.356 meters and a height H2 of 0.125 meters, assuming that the three-dimensional model of the first target object is obtained by three-dimensional modeling of the flat-bed tire (105), as follows: defining a three-dimensional model coordinate system, using the center of the second base circle as the origin (0,0,0), and the coordinate axes are X-axis forward, Y-axis leftward, and Z-axis upward, so that the vertex set of the three-dimensional model of the first target object comprises m bisectors on the second base circle

And m equally dividing points on the top surface circle corresponding to the second bottom surface circle

Where H2 denotes a height, R2 denotes a radius of the second base circle, k is 0,1,2, …, m-1, and m is a positive integer. In this embodiment, m is 36.

Step S405: and determining at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object.

Specifically, the process may be considered as placing the three-dimensional model of the first target object at the position of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, that is, the coordinates of the three-dimensional model coordinate system corresponding to all points in the set of vertices of the three-dimensional model of the first target object are translated from the three-dimensional model coordinate system to the vehicle body coordinate system.

In one example, assume that the coordinates of the vehicle body coordinate system corresponding to the reference points in one region of interest are (X) _GP ,Y _GP ,Z _GP ) The coordinate of a point in the set of vertices of the three-dimensional model of the first target object in the three-dimensional model coordinate system is (P) _X ,P _Y ,P _Z ) Then placing the three-dimensional model of the first target object on the one reference point of interest, and determining the coordinate of one point in the set of vertices of the three-dimensional model of the first target object in the vehicle body coordinate system as (X) _GP +P _X ,Y _GP +P _Y ,Z _GP +P _Z ) In this way, the coordinates of all points in the set of vertices of the three-dimensional model of the first target object in the vehicle body coordinate system, that is, at least one coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system, can be determined.

In one example, as shown in fig. 18, a process of placing the three-dimensional model of the first target object at the position of the coordinates of the vehicle coordinate system corresponding to the reference point in each of the regions of interest is shown, that is, the reference point in each of the regions of interest corresponds to one of the three-dimensional models of the first target object, and Q regions of interest correspond to the three-dimensional models of Q first target objects. Specifically, the correspondence between the position of the coordinate of the image coordinate system corresponding to the reference point in each region of interest, the position of the coordinate of the vehicle coordinate system corresponding to the reference point in each region of interest, and the position of the three-dimensional model of the first target object in at least one coordinate corresponding to the vehicle coordinate system is shown in table 2.

Table 2

Step S406: and projecting at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system in the image coordinate system to obtain Q pixel regions.

Specifically, this process may be referred to as three-dimensional projection, and according to the external parameters, the internal reference matrix, and the scale parameters of the camera, at least one coordinate of the three-dimensional model of the first target object in the vehicle coordinate system is converted from the vehicle coordinate system to the image coordinate system, that is, at least one coordinate of the three-dimensional model of the first target object in the image coordinate system is determined, and then a point set of the three-dimensional model of the first target object in the at least one coordinate of the image coordinate system is outlined, so as to obtain Q pixel regions, where Q is a positive integer.

Specifically, at least one coordinate of the three-dimensional model of the first target object in the image coordinate system is determined according to at least one coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system, which is specifically as follows:

first, an image physical coordinate system O _c XYZ and camera coordinate system O _c X _c Y _c Z _c The relationship between:

assume that the coordinate of an object point P in space under the camera coordinate system is (X) as shown in FIG. 19 _c ,Y _c ,Z _c ). The coordinates of the corresponding phase point P of the P point in the image physical coordinate system are:

wherein (X) _c ,Y _c ,Z _c ) Representing the coordinates of an object point P in space in the camera coordinate system, f representing the focal length of the camera, Z _c Representing a scale parameter.

Second, the camera coordinate system O _c X _c Y _c Z _c And the vehicle body coordinate system O _w X _w Y _w Z _w The relationship between:

wherein, 0 ^T ＝(0,0,0) ^T R3 is a rotation matrix, t is a displacement vector, (X) _c ,Y _c ,Z _c ) Representing the coordinates of an object point P in space in the camera coordinate system, (X) _w ,Y _w ,Z _w ) Representing the coordinates of an object point P in space in the vehicle body coordinate system.

Then, the relationship between the image pixel coordinate system and the vehicle body coordinate system is determined according to the formula (2), the formula (5), and the formula (6):

thus, at least one coordinate of the three-dimensional model of the first target object in the image coordinate system may be determined according to equation (7), i.e., according to the at least one coordinate of the three-dimensional model of the first target object in the body coordinate system, wherein [ u, v] ^T At least one coordinate, [ X ], corresponding to the image coordinate system, of the three-dimensional model representing the first target object _w ,Y _w ,Z _w ]At least one coordinate, Z, corresponding to the vehicle body coordinate system, of the three-dimensional model representing the first target object _c Representing the scale parameters, it may also be considered that at least one coordinate of the three-dimensional model of the first target object corresponds to the camera coordinate system,

is the camera's internal reference matrix, R3 is the rotation matrix, and t is the displacement vector.

In an example, assuming that the positions of the three-dimensional model of the first target object in the at least one coordinate corresponding to the vehicle body coordinate system are 1107, 1108, 1109, 1110, 1111, 1112, respectively, then projecting the three-dimensional model of the first target object in the at least one coordinate corresponding to the vehicle body coordinate system in the image coordinate system obtains Q pixel regions, as shown in fig. 20, Q pixel regions corresponding to the positions of the three-dimensional model of the first target object in the at least one coordinate corresponding to the vehicle body coordinate system are 1107, 1108, 1109, 1110, 1111, 1112, respectively, 1201, 1202, 1203, 1204, 1205, 1206, respectively. The Q pixel regions are illustrated schematically in the first image as shown in fig. 21. Specifically, the position of the coordinates of the image coordinate system corresponding to the reference point in each region of interest, the position of the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest, the position of at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system, and the correspondence between Q pixel regions are shown in table 3.

Table 3

In one example, assume that a first image is shown in fig. 6, and that Q regions of interest resulting from processing the first image are shown in fig. 9. Then determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested area according to the coordinates of the image coordinate system corresponding to the reference point in each interested area; determining that the three-dimensional model of the first target object is obtained by three-dimensional modeling of a triangular warning board for a motor vehicle; then determining at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object; projecting at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system in the image coordinate system to obtain Q pixel regions, as shown in fig. 22.

In one example, assume that a first image is shown in FIG. 7, and that Q regions of interest resulting from processing the first image are shown in FIG. 10. Then determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested area according to the coordinates of the image coordinate system corresponding to the reference point in each interested area; determining that a three-dimensional model of the first target object is obtained by three-dimensional modeling of a lying tire; then determining at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object; projecting at least one coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system in the image coordinate system to obtain Q pixel regions, as shown in fig. 23.

Step S407: and determining the detection result of the target according to the Q interested areas and the Q pixel areas.

Specifically, Q interesting regions corresponding to Q pixel regions are screened to obtain R interesting regions, wherein each pixel region corresponds to one interesting region, R is a positive integer and R is less than or equal to Q; then determining R circumscribed rectangles corresponding to the R interesting regions; and determining the detection result of the target according to the R circumscribed rectangles. Each pixel region corresponding to a region of interest can be considered a clustering process. Assuming that at least one coordinate corresponding to the vehicle body coordinate system of the three-dimensional model of the first target object is projected in the image coordinate system to obtain Q pixel regions, as shown in fig. 20, 6 pixel regions are respectively a pixel region 1201, a pixel region 1202, a pixel region 1203, a pixel region 1204, a pixel region 1205 and a pixel region 1206, where an interest region corresponding to the pixel region 1201 refers to a white region portion included in the pixel region 1201, an interest region corresponding to the pixel region 1202 refers to a white region portion included in the pixel region 1202, and an interest region corresponding to the pixel region 1203, an interest region corresponding to the pixel region 1204, an interest region corresponding to the pixel region 1205 and an interest region corresponding to the pixel region 1206 are all described in detail, and are not repeated here.

In a possible implementation manner, the screening processing on the Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest may include the following 5 screening rules:

rule 1: and when the proportion of the area of a first interested area in the Q interested areas to the convex envelope area of the first interested area is larger than a first preset value, taking the first interested area as one of the R interested areas. In one example, the first preset value may be 50%, and assuming that the area of a first region of interest of the Q regions of interest is 20 pixels (pixels), and the convex envelope area of the first region of interest is 25 pixels (pixels), since the ratio of the area of the first region of interest 20(pixels) to the convex envelope area of the first region of interest 25(pixels) is 80%, and 80% is greater than 50%, the first region of interest remains as one of the R regions of interest.

Rule 2: and when the ratio of the convex hull aspect ratio of a first interested area in the Q interested areas to the aspect ratio of a pixel area corresponding to the first interested area meets the condition that the ratio is greater than a second preset value and less than a third preset value, taking the first interested area as one of the R interested areas. In one example, the second preset value is 0.5 and the third preset value is 2.

Rule 3: and when the ratio of the convex envelope area of the first interested area in the Q interested areas to the area of the pixel area corresponding to the first interested area meets the condition that the ratio is greater than the second preset value and less than the third preset value, taking the first interested area as one of the R interested areas. In one example, the second preset value is 0.5 and the third preset value is 2.

Rule 4: and when the intersection ratio IOU of the convex envelope contour of the first interested area in the Q interested areas and the contour of the pixel area corresponding to the first interested area is greater than a fourth preset value, taking the first interested area as one of the R interested areas. In one example, the fourth preset value is 70%.

Rule 5: evaluating and scoring the content in a circumscribed rectangle of a first interested region in the Q interested regions by using a pre-trained classifier to obtain a first score; and when the first score is higher than a fifth preset value, taking the first interested area as one of the R interested areas.

The above rule 1, rule 2, rule 3, rule 4 and rule 5 may be combined arbitrarily, and are not limited herein. For example, when the above rule 1, rule 2, rule 3 and rule 4 are satisfied simultaneously, that is, when the ratio of the area of the first region of interest to the convex envelope area of the first region of interest in the Q regions of interest is greater than a first preset value, the ratio of the convex envelope aspect ratio of the first region of interest to the aspect ratio of the pixel region corresponding to the first region of interest meets the condition that the ratio is greater than a second preset value and less than a third preset value, the ratio of the convex envelope area of the first region of interest to the area of the pixel region corresponding to the first region of interest meets the condition that the ratio is greater than the second preset value and less than the third preset value, the intersection ratio IOU of the convex envelope contour of the first region of interest to the contour of the pixel region corresponding to the first region of interest is greater than a fourth preset value, and the first region of interest is taken as one of the R regions of interest.

In a possible implementation manner, determining the detection result of the target according to the R circumscribed rectangles includes: calculating the areas of the R circumscribed rectangles; and determining the detection result of the target according to the areas of the R circumscribed rectangles.

Specifically, a non-maxima suppression algorithm may be employed to determine the detection of the target. Firstly, setting that all the R circumscribed rectangles are not inhibited, then sorting the R circumscribed rectangles according to the size of the area from large to small, then traversing from the circumscribed rectangle with the largest area, setting all the circumscribed rectangles with the superposition degree larger than a threshold value as the inhibition for each circumscribed rectangle if the circumscribed rectangles are not inhibited, and finally returning the circumscribed rectangles which are not inhibited. Of course, other methods may also be used to determine the detection result of the target according to the areas of the R circumscribed rectangles, and the embodiment of the present application is not limited.

In a possible implementation manner, determining the detection result of the target according to the R circumscribed rectangles includes: evaluating and scoring the contents in the R circumscribed rectangles by using a pre-trained classifier to obtain R scores; and determining the detection result of the target according to the R scores.

Specifically, after the R scores are obtained, a non-maximum suppression algorithm may be used to determine the detection result of the target. Firstly, setting that all the R circumscribed rectangles are not inhibited, then sorting the R circumscribed rectangles from large to small according to the values of the R scores, then traversing from the circumscribed rectangle with the maximum score, setting all the circumscribed rectangles of which the superposition degree with each circumscribed rectangle is larger than a threshold value as the inhibition if each circumscribed rectangle is not inhibited, and finally returning the circumscribed rectangles which are not inhibited. Of course, other methods may also be used to determine the detection result of the target according to the R scores, and the embodiment of the present application is not limited.

In an example, assuming that the three-dimensional model of the first target object is projected on the image coordinate system at least one coordinate corresponding to the vehicle body coordinate system to obtain Q pixel regions, as shown in fig. 20, Q is 6, each pixel region corresponds to one region of interest, the regions of interest corresponding to the 6 pixel regions are filtered by the filtering rule, where the filtering rule is that rule 1, rule 2, rule 3 and rule 4 are simultaneously satisfied, it is determined that the region of interest corresponding to the pixel region 1201, the region of interest corresponding to the pixel region 1202 and the region of interest corresponding to the pixel region 1206 in fig. 20 do not satisfy the filtering rule, then the region of interest corresponding to the pixel region 1201, the region of interest corresponding to the pixel region 1202 and the region of interest corresponding to the pixel region 1206 are filtered, and then the region of interest corresponding to the pixel region 1203 is filtered, The region of interest corresponding to the pixel region 1204 and the region of interest corresponding to the pixel region 1205 satisfy the above filtering rules and are retained. Then, circumscribed rectangles (detecting frames) are made for the region of interest corresponding to the pixel region 1203, the region of interest corresponding to the pixel region 1204, and the region of interest corresponding to the pixel region 1205, that is, the circumscribed rectangles 1401, the circumscribed rectangle 1402, and the circumscribed rectangle 1403, respectively, in the specific format of [ x1, y1, w1, h1], where (x1, y1) is the upper left-corner image coordinate of the rectangular frame, and (w1, h1) is the pixel width and height of the rectangular frame, as shown in fig. 24.

In the first mode, the areas of the circumscribed rectangle 1401, the circumscribed rectangle 1402 and the circumscribed rectangle 1403 are calculated, and according to a non-maximum suppression algorithm with a larger area priority, the circumscribed rectangle 1402 and the circumscribed rectangle 1403 are determined to be removed, so that the circumscribed rectangle 1401 is output, and finally, the detection result of the target, that is, the position of the traffic cone in the first image is obtained, as shown in fig. 25.

In the second mode, a pre-trained classifier is used for evaluating and scoring the contents in the circumscribed rectangle 1401, the circumscribed rectangle 1402 and the circumscribed rectangle 1403 to obtain 3 scores; then, assuming that none of the circumscribed rectangle 1401, the circumscribed rectangle 1402 and the circumscribed rectangle 1403 is inhibited, the circumscribed rectangle 1401, the circumscribed rectangle 1402 and the circumscribed rectangle 1403 are sorted from large to small according to the score; traversing from the circumscribed rectangle with the highest score, and for each circumscribed rectangle, if the circumscribed rectangle is not suppressed, setting all circumscribed rectangles with the highest scores and with the superposition degree of which greater than a threshold value as suppression; finally, the circumscribed rectangle which is not suppressed, i.e. the circumscribed rectangle 1401, is returned, and the detection result of the target, i.e. the position of the traffic cone in the first image, is finally obtained, as shown in fig. 25.

In an example, it is assumed that the three-dimensional model of the first target object is projected on the image coordinate system at least one coordinate corresponding to the vehicle body coordinate system to obtain Q pixel regions, as shown in fig. 22, Q is 2, each pixel region corresponds to one region of interest, the regions of interest corresponding to 2 pixel regions are filtered through a filtering rule, where the filtering rule is that rule 1, rule 2, rule 3, and rule 4 are satisfied at the same time, and it is determined that the region of interest corresponding to pixel region 1902 in fig. 22 does not satisfy the filtering rule, the region of interest corresponding to pixel region 1902 is filtered, and the region of interest corresponding to pixel region 1901 satisfies the filtering rule, and is retained. Then, a circumscribed rectangle (rectangle bounding box) is made for the region of interest corresponding to the pixel region 1901, that is, the detection box, and the specific format is [ x1, y1, w1, h1], where (x1, y1) is the upper left corner image coordinate of the rectangle box, and (w1, h1) is the pixel width and height of the rectangle box. Then, based on the circumscribed rectangle, the detection result of the target, that is, the position of the triangular warning panel for the automobile in the first image is determined, as shown in fig. 26.

In an example, assuming that the three-dimensional model of the first target object is projected on the image coordinate system at least one coordinate corresponding to the vehicle body coordinate system to obtain Q pixel regions, as shown in fig. 23, each pixel region corresponds to one region of interest, the regions of interest corresponding to the Q pixel regions are filtered by the filtering rule, where the filtering rule is that rule 1, rule 2, rule 3 and rule 4 are satisfied at the same time, it is determined that the region of interest corresponding to the pixel region 2501 in fig. 23 satisfies the above filtering rule, and is retained, and the region of interest corresponding to the pixel region other than the pixel region 2501 does not satisfy the above filtering rule, then the region of interest corresponding to the pixel region other than the pixel region 2501 is filtered. Then, a circumscribed rectangle (rectangle bounding box) is made for the region of interest corresponding to the pixel region 2501, that is, the detection box, and the specific format is [ x1, y1, w1, h1], where (x1, y1) is the upper left-corner image coordinate of the rectangle box, and (w1, h1) is the pixel width and height of the rectangle box. Then, based on the circumscribed rectangle, the detection result of the target, that is, the position of the flat tire in the first image is determined, as shown in fig. 27.

The method of the embodiments of the present application is explained in detail above, and the apparatus of the embodiments of the present application is provided below.

Referring to fig. 28, fig. 28 is a schematic structural diagram of an object detection apparatus 2800 according to an embodiment of the present disclosure, and the object detection apparatus may include an obtaining module 2801 and a processing module 2802, where details of each module are described below.

An obtaining module 2801 for obtaining a first image; a processing module 2802, configured to process the first image to obtain Q regions of interest, and determine coordinates of an image coordinate system corresponding to a reference point in each region of interest; wherein Q is a positive integer; the processing module 2802 is configured to determine, according to the coordinates of the image coordinate system corresponding to the reference point in each region of interest, the coordinates of the vehicle body coordinate system corresponding to the reference point in each region of interest; the processing module 2802, configured to determine a three-dimensional model of the first target object; the processing module 2802 is configured to determine at least one coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object; the processing module 2802 is configured to project at least one coordinate, corresponding to the vehicle body coordinate system, of the three-dimensional model of the first target object in the image coordinate system to obtain Q pixel regions; the processing module 2802 is configured to determine a detection result of the target according to the at least one region of interest and the Q pixel regions.

In a possible implementation manner, the processing module 2802 is further configured to perform screening processing on Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest, where each pixel region corresponds to one region of interest, R is a positive integer and R is less than or equal to Q; determining R circumscribed rectangles corresponding to the R interested areas; and determining the detection result of the target according to the R circumscribed rectangles.

In yet another possible implementation manner, the processing module 2802 is further configured to, in a case that a ratio of an area of a first region of interest of the Q regions of interest to a convex envelope area of the first region of interest is greater than a first preset value, regard the first region of interest as one of the R regions of interest; under the condition that the ratio of the convex envelope aspect ratio of a first interested region in the Q interested regions to the aspect ratio of a pixel region corresponding to the first interested region meets the condition that the ratio is greater than a second preset value and smaller than a third preset value, taking the first interested region as one of the R interested regions; taking the first interested area as one of the R interested areas under the condition that the ratio of the convex envelope area of the first interested area in the Q interested areas to the area of the pixel area corresponding to the first interested area meets the condition that the ratio is more than the second preset value and less than the third preset value; and under the condition that the intersection ratio IOU of the convex envelope contour of the first interested region in the Q interested regions and the contour of the pixel region corresponding to the first interested region is larger than a fourth preset value, taking the first interested region as one of the R interested regions.

In yet another possible implementation manner, the processing module 2802 is further configured to use a pre-trained classifier to evaluate and score the content in a circumscribed rectangle of a first region of interest of the Q regions of interest, so as to obtain a first score; and taking the first interested area as one of the R interested areas when the first score is higher than a fifth preset value.

In yet another possible implementation manner, the processing module 2802 is further configured to calculate the areas of the R bounding rectangles; and determining the detection result of the target according to the areas of the R circumscribed rectangles.

In yet another possible implementation manner, the processing module 2802 is further configured to use a pre-trained classifier to evaluate and score the contents in the R circumscribed rectangles, so as to obtain R scores; and determining the detection result of the target according to the R scores.

In yet another possible implementation, the set of vertices of the three-dimensional model of the first target object includes: a first upper vertex (0,0, H1), and n bisectors on the first base circle corresponding to said first upper vertex

Wherein H1 represents a first height, R1 represents a radius of a first base circle corresponding to the first upper vertex, a center of the first base circle is taken as an origin (0,0,0), a coordinate axis is a three-dimensional model coordinate system defined by an X-axis forward direction, a Y-axis leftward direction, and a Z-axis direction, k is 0,1,2, …, n-1, and n is a positive integer.

In yet another possibilityIn one implementation, the set of vertices of the three-dimensional model of the first target object includes: a second top vertex (0,0, Lxcos (π/3)), a left vertex

And the right vertex

Wherein, L represents the side length, the center of the bottom edge is taken as the origin (0,0,0), and the coordinate axis is a three-dimensional model coordinate system defined by the forward X-axis, the leftward Y-axis and the axial Z-axis.

In yet another possible implementation, the set of vertices of the three-dimensional model of the first target object includes: m equal division points on the second bottom circle

It should be noted that the implementation and beneficial effects of the respective modules may also correspond to the corresponding description of the method embodiment shown in fig. 4.

Referring to fig. 29, fig. 29 is a diagram of an object detection apparatus 2900 according to an embodiment of the present application, where the apparatus 2900 includes a processor 2901, a communication interface 2903, and optionally a memory 2902, and the processor 2901, the memory 2902, and the communication interface 2903 are connected to each other through a bus 2904.

The memory 2902 includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or portable read-only memory (CD-ROM), and the memory 2902 is used for associated computer programs and data. Communication interface 2903 is used to receive and transmit data.

The processor 2901 can be one or more Central Processing Units (CPUs), and when the processor 2901 is a CPU, the CPU can be a single-core CPU or a multi-core CPU.

The processor 2901 in the apparatus 2900 is configured to read the computer program code stored in the memory 2902 and execute the method described above with respect to fig. 4.

The object detection means may be a vehicle having an object detection function, or another component having an object detection function. The target detection means include, but are not limited to: the vehicle can pass through the vehicle-mounted terminal, the vehicle-mounted controller, the vehicle-mounted module, the vehicle-mounted component, the vehicle-mounted chip, the vehicle-mounted unit, the vehicle-mounted radar or the camera, and the like, and the method provided by the application is implemented.

The object detection device may also be, or be provided in, or a component of, another intelligent terminal having an object detection function other than the vehicle. The intelligent terminal can be other terminal equipment such as intelligent transportation equipment, intelligent home equipment and robots. The target detection device includes, but is not limited to, a smart terminal or a controller, a chip, other sensors such as a radar or a camera, and other components in the smart terminal.

The object detection means may be a general purpose device or a special purpose device. In a specific implementation, the apparatus may also be a desktop computer, a laptop computer, a network server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or other devices with processing functions. The embodiment of the present application does not limit the type of the object detection device.

The object detection device may also be a chip or a processor with processing functionality, and the object detection device may comprise a plurality of processors. The processor may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. The chip or processor having the processing function may be provided in the sensor, or may be provided not in the sensor but on a receiving end of the sensor output signal.

An embodiment of the present application further provides a chip system, where the chip system includes at least one processor and a communication interface, where the at least one processor is configured to call a computer program from the communication interface, and when the processor executes the instruction, the method flow shown in fig. 4 is implemented.

Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the method flow shown in fig. 4 is implemented.

The embodiment of the present application further provides a computer program product, and when the computer program product runs on a computer, the method flow shown in fig. 4 is implemented.

Embodiments of the present application further provide a vehicle, which includes at least one object detection device. One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments can be implemented by hardware associated with a computer program that can be stored in a computer-readable storage medium, and when executed, can include the processes of the above method embodiments. And the aforementioned storage medium includes: various media that can store computer program code, such as ROM or RAM, magnetic or optical disks, etc.

Claims

1. A method of object detection, comprising:

acquiring a first image;

processing the first image to obtain Q interested areas, and determining the coordinates of an image coordinate system corresponding to the reference point in each interested area;

determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested region according to the coordinates of the image coordinate system corresponding to the reference point in each interested region;

determining a three-dimensional model of a first target object;

determining at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system according to the coordinate of the vehicle body coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object;

projecting at least one coordinate of the three-dimensional model of the first target object corresponding to the vehicle body coordinate system in the image coordinate system to obtain Q pixel areas;

determining the detection result of the target according to the Q interested areas and the Q pixel areas; wherein Q is a positive integer.

2. The method of claim 1, wherein said determining the detection of the object based on the Q regions of interest and the Q pixel regions comprises:

screening Q interested areas corresponding to the Q pixel areas to obtain R interested areas, wherein each pixel area corresponds to one interested area, R is a positive integer, and R is less than or equal to Q;

determining R circumscribed rectangles corresponding to the R interested areas;

and determining the detection result of the target according to the R circumscribed rectangles.

3. The method according to claim 2, wherein the screening of the Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest includes at least one of:

when the proportion of the area of a first interested area in the Q interested areas to the convex envelope area of the first interested area is larger than a first preset value, taking the first interested area as one of the R interested areas;

when the ratio of the convex hull aspect ratio of a first interested area in the Q interested areas to the aspect ratio of a pixel area corresponding to the first interested area meets the condition that the ratio is greater than a second preset value and smaller than a third preset value, taking the first interested area as one of the R interested areas;

when the ratio of the convex envelope area of the first interested area in the Q interested areas to the area of the pixel area corresponding to the first interested area meets the condition that the ratio is greater than the second preset value and smaller than the third preset value, taking the first interested area as one of the R interested areas;

and when the intersection ratio IOU of the convex envelope contour of the first interested region in the Q interested regions and the contour of the pixel region corresponding to the first interested region is larger than a fourth preset value, taking the first interested region as one of the R interested regions.

4. The method according to claim 2, wherein the screening of the Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest comprises:

evaluating and scoring the content in a circumscribed rectangle of a first interested region in the Q interested regions by using a pre-trained classifier to obtain a first score;

and when the first score is higher than a fifth preset value, taking the first interested area as one of the R interested areas.

5. The method according to any one of claims 2-4, wherein the determining the detection result of the target according to the R circumscribed rectangles comprises:

calculating the areas of the R circumscribed rectangles;

and determining the detection result of the target according to the areas of the R circumscribed rectangles.

6. The method according to any one of claims 2-4, wherein the determining the detection result of the target according to the R circumscribed rectangles comprises:

evaluating and scoring the contents in the R circumscribed rectangles by using a pre-trained classifier to obtain R scores;

and determining the detection result of the target according to the R scores.

7. The method of any of claims 1-6, wherein the set of vertices of the three-dimensional model of the first target object comprises:

a first upper vertex (0,0, H1), and n bisectors on the first base circle corresponding to the first upper vertex

8. The method of any of claims 1-6, wherein the set of vertices of the three-dimensional model of the first target object comprises:

a second top vertex (0,0, Lxcos (π/3)), a left vertex

And the right vertex

9. The method of any of claims 1-7, wherein the set of vertices of the three-dimensional model of the first target object comprises:

m equal division points on the second bottom circle

Wherein H2 represents the second height, R2 represents the radius of the second base circle, k is 0,1,2, …, m-1, and m is a positive integer.

10. An object detection device, comprising:

the acquisition module is used for acquiring a first image;

the processing module is used for processing the first image to obtain Q interested areas and determining the coordinates of an image coordinate system corresponding to the reference point in each interested area;

the processing module is used for determining the coordinates of the vehicle body coordinate system corresponding to the reference point in each interested region according to the coordinates of the image coordinate system corresponding to the reference point in each interested region;

the processing module is used for determining a three-dimensional model of the first target object;

the processing module is configured to determine at least one coordinate of the three-dimensional model of the first target object in the vehicle coordinate system according to the coordinate of the vehicle coordinate system corresponding to the reference point in each region of interest and the vertex set of the three-dimensional model of the first target object;

the processing module is used for projecting at least one coordinate of the three-dimensional model of the first target object in the vehicle body coordinate system in the image coordinate system to obtain Q pixel areas;

the processing module is configured to determine a detection result of the target according to the at least one region of interest and the Q pixel regions, where Q is a positive integer.

11. The apparatus of claim 10,

the processing module is further configured to perform screening processing on Q regions of interest corresponding to the Q pixel regions to obtain R regions of interest, where each pixel region corresponds to one region of interest, R is a positive integer and R is less than or equal to Q; determining R circumscribed rectangles corresponding to the R interested areas; and determining the detection result of the target according to the R circumscribed rectangles.

12. The apparatus of claim 11,

the processing module is further configured to, when a ratio of an area of a first region of interest of the Q regions of interest to a convex envelope area of the first region of interest is greater than a first preset value, take the first region of interest as one of the R regions of interest; under the condition that the ratio of the convex envelope aspect ratio of a first interested region in the Q interested regions to the aspect ratio of a pixel region corresponding to the first interested region meets the condition that the ratio is greater than a second preset value and smaller than a third preset value, taking the first interested region as one of the R interested regions; taking the first interested area as one of the R interested areas under the condition that the ratio of the convex envelope area of the first interested area in the Q interested areas to the area of the pixel area corresponding to the first interested area meets the condition that the ratio is more than the second preset value and less than the third preset value; and under the condition that the intersection ratio IOU of the convex envelope contour of the first interested region in the Q interested regions and the contour of the pixel region corresponding to the first interested region is greater than a fourth preset value, taking the first interested region as one of the R interested regions.

13. The apparatus of claim 11,

the processing module is further used for evaluating and scoring the content in the circumscribed rectangle of the first interested area in the Q interested areas by using a pre-trained classifier to obtain a first score; and taking the first interested area as one of the R interested areas when the first score is higher than a fifth preset value.

14. The apparatus according to any one of claims 11-13,

the processing module is further used for calculating the areas of the R circumscribed rectangles; and determining the detection result of the target according to the areas of the R circumscribed rectangles.

15. The apparatus according to any one of claims 11-13,

the processing module is further used for evaluating and scoring the contents in the R circumscribed rectangles by using a pre-trained classifier to obtain R scores; and determining the detection result of the target according to the R scores.

16. The apparatus of any of claims 10-15, wherein the set of vertices of the three-dimensional model of the first target object comprises:

a first upper vertex (0,0, H1), and n bisectors on the first base circle corresponding to said first upper vertex

Wherein H1 denotes a first height, R1 denotes a radius of a first base circle corresponding to the first upper vertex, a center of the first base circle is an origin (0,0,0), a coordinate axis is a three-dimensional model coordinate system defined by the X-axis forward, the Y-axis leftward, and the Z-axis, k is 0,1,2, …, n-1, and n is a positive integer.

17. The apparatus of any of claims 10-15, wherein the set of vertices of the three-dimensional model of the first target object comprises:

a second upper vertex (0,0, L × cos (π/3)), and a left vertex

And the right vertex

18. The apparatus of any of claims 10-15, wherein the set of vertices of the three-dimensional model of the first target object comprises:

m equal division points on the second bottom circle

19. An object detection device, comprising: a processor and a memory; the memory for storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by the apparatus, cause the apparatus to perform the method of any of claims 1-9 by executing the one or more programs stored by the memory.

20. A chip system, characterized in that the chip system comprises at least one processor and an acquisition interface, the at least one processor being configured to invoke a computer program from the acquisition interface, and when the processor executes the instructions, to cause an apparatus in which the chip system is located to implement the method according to any one of claims 1 to 9.

21. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out the method of any one of claims 1-9.

22. A vehicle characterized in that the vehicle comprises an object detection device according to any one of claims 10 to 19.