WO2021185812A1 - Object detection and guiding a vehicle - Google Patents

Object detection and guiding a vehicle Download PDF

Info

Publication number
WO2021185812A1
WO2021185812A1 PCT/EP2021/056632 EP2021056632W WO2021185812A1 WO 2021185812 A1 WO2021185812 A1 WO 2021185812A1 EP 2021056632 W EP2021056632 W EP 2021056632W WO 2021185812 A1 WO2021185812 A1 WO 2021185812A1
Authority
WO
WIPO (PCT)
Prior art keywords
bounding box
computing unit
initial
camera
image
Prior art date
Application number
PCT/EP2021/056632
Other languages
French (fr)
Inventor
Prashanth Viswanath
Ciaran Hughes
Original Assignee
Connaught Electronics Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Connaught Electronics Ltd. filed Critical Connaught Electronics Ltd.
Publication of WO2021185812A1 publication Critical patent/WO2021185812A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Definitions

  • the present invention relates to a method for object detection, wherein a computing unit is used to receive an image from a camera and to divide the image into a plurality of cells and select one of the cells and to determine a bounding box for an object on the image by fitting an anchor box to the object, wherein at least a part of the object is located in the selected cell.
  • the invention further relates to a corresponding method for guiding a vehicle at least in part automatically, to an electronical vehicle guidance system and to a computer program product.
  • Deep learning may be used for example for object detection, classification or segmentation. It may be used for various driver assistance applications or autonomous driving. For object detection and pedestrian detection, for example single shot detectors may be used.
  • Such detectors may divide an image into multiple smaller cells and within these cells they have a pre-defined set of anchors, which are used to fit the objects that lie within that cell. These anchors assume that the objects are vertical and standing on a flat ground plane.
  • the improved concept is based on the idea to replace a constant reference direction, such as vertical or horizontal direction, throughout the image by a locally varying or location dependent reference direction or, in particular, a locally depending vertical or horizontal direction.
  • an anchor box is fitted to the object to determine a bounding box.
  • the anchor box or the bounding box are determined by rotating an initial anchor box or an initial bounding box, respectively, by a location-dependent angle or rotation parameter.
  • a method for object detection is provided.
  • a computing unit in particular a computing unit of a vehicle, is used to receive an image from a camera, in particular a camera of the vehicle, and the computing unit is used to divide the image into a plurality of cells and to select one cell of the plurality of cells.
  • the computing unit is used to determine a bounding box for an object on the image by fitting an anchor box to the object, wherein at least a part of the object is located in the selected cell.
  • the computing unit is used to retrieve a predetermined rotation parameter from a storage medium, in particular from a storage medium of the computing unit or of the camera or of an electronic vehicle guidance system of the vehicle, wherein the rotation parameter, in particular the value of the rotation parameter, depends on a location of the selected cell within the image.
  • the computing unit is used to determine the anchor box by rotating a predefined initial anchor box depending on the rotation parameter.
  • the computing unit is used to determine the bounding box by rotating an initial bounding box depending on the rotation parameter, wherein the initial bounding box is, in particular, determined by fitting the initial anchor box to the object.
  • Dividing the image into the plurality of cells may be understood such that a predefined grid, for example a rectangular or quadratic grid, is superposed to the image.
  • the described steps of selecting a cell, determining the bounding box, retrieving the rotation parameter and determining the anchor box or bounding box by rotation of the initial anchor box or the initial bounding box, respectively, may for example be carried out for all of the grid cells, provided an object or a part of an object is present in the respective cell.
  • the method steps described above relate to one specific cell of the plurality of cells, namely the selected cell, but can be carried out for all other cells as well to perform the object detection on the full image.
  • the anchor box may belong to a set of anchor boxes associated to the respective selected cell.
  • Each anchor box of the set can be understood as a typical or potential bounding box for objects to be expected in the image.
  • the different anchor boxes of the set of anchor boxes of the selected cell may have different sizes and/or different shapes.
  • the anchor box may be of rectangular or quadratic shape. However, this is not necessarily the case. It is sufficient that the anchor box has a well-defined orientation, which, in case of a rectangle, may be given by the direction of one of the sides of the rectangle. However, in case of other shapes, other well-defined orientations may be defined.
  • the initial anchor box may also be an initial anchor box of a set of initial anchor boxes associated to the selected cell.
  • the respective sets of initial anchor boxes may be identical for all cells of the grid. This may not be the case for the various sets of anchor boxes of different cells, which may depend on the respective rotation parameter in some implementations.
  • Fitting the anchor box to the object may for example be understood as selecting the anchor box of the respective set of anchor boxes, which fits the object best.
  • fitting the anchor box may correspond to selecting the optimal anchor box of the set of anchor boxes or to a minimization of an error for approximating the object by the respective anchor box.
  • the rotation parameter corresponds to a rotation angle with respect to a pre defined reference direction defined globally for the whole image.
  • the reference direction is the same for all cells.
  • the reference direction may correspond to the vertical or horizontal direction according to one of the sides of the image. This is, in case of a rectangular grid, equivalent to choosing a grid direction as the reference direction.
  • Determining the bounding box for the object can be understood as an object detection task.
  • the object detection may or may not comprise further tasks.
  • the object detection may comprise determining further bounding boxes for further objects on the image, in particular in the same way as described for the bounding box and the selected cell.
  • the object detection may also comprise further tasks based on the bounding box or using the bounding box, such as object tracking or segmentation tasks.
  • Determining the bounding box by fitting the anchor box to the object may for example be carried out by applying a trained algorithm, in particular an algorithm based on machine learning and/or computer vision, to the image or to the individual cells.
  • a trained algorithm in particular an algorithm based on machine learning and/or computer vision
  • the predefined rotational parameter may for example be provided to the algorithm.
  • determining the rotation parameter may be part of the algorithm.
  • the trained algorithm may for example be based on a trained artificial neural network, for example a convolutional neural network, CNN.
  • a trained artificial neural network for example a convolutional neural network, CNN.
  • a method according to the improved concept effectively uses local anchor boxes or local bounding boxes having location-dependent orientations with respect to the reference direction. In this way, deviations from the assumption regarding vertical objects on flat ground in the image may be compensated, which results in a higher accuracy of the bounding box and the object detection.
  • the camera may not necessarily map horizontal or vertical lines in the real world to horizontal or vertical lines, respectively, in the image for various reasons.
  • the camera is a non-rectilinear camera, such as a fisheye camera.
  • Such cameras do in general not map straight lines in the real world to straight lines on the image. Therefore, the assumption stated above is intrinsically wrong in this case.
  • Another reason may that, even for rectilinear cameras, vertical objects or horizontal objects in the real world do not necessarily map to vertical images or horizontal images of said object due to perspective effects. In other words, this may be the case if the object is not positioned on an optical axis of the camera. Both sources of inaccuracy may be compensated by the improved concept.
  • a non-rectilinear camera can be understood as a camera with a non-rectilinear lens or lens unit.
  • a non-rectilinear lens or lens unit can be understood as a lens or lens unit, that is one or more lenses, having a non-rectilinear mapping function, also denoted as curvilinear mapping function.
  • fisheye cameras represent non-rectilinear cameras.
  • the mapping function of the lens or lens unit can be understood as a function r(0) mapping an angle Q from the optical axis of the lens or lens unit to a radial shift r out of the image center.
  • the function depends parametrically on the focal length f of the lens or lens unit.
  • a rectilinear lens or lens unit maps straight lines in the real world to straight lines in the image, at least up to lens imperfections.
  • a non-rectilinear or curvilinear lens or lens unit does, in general, not map straight lines to straight lines in the image.
  • the mapping function of a non-rectilinear camera can be stereographic, equidistant, equisolid angle or orthographic.
  • Other examples of mapping functions of non-rectilinear lens units are polynomial functions.
  • Bounding boxes may for example be used for determining the distance between the camera and the object. In case of an incorrect or inaccurate bounding box, the distance may be estimated smaller than it actually is. This leads to pessimistic distance estimations, for example in the case of automotive applications. The improved concept may therefore also lead to less pessimistic distance estimations. This may be particularly beneficial in the context of partly or fully autonomous driving or parking applications.
  • the camera is used to generate the image.
  • the computing unit is used to estimate a distance between the object and the camera depending on the bounding box.
  • the computing unit may be used to determine the position of a foot point or reference point of the bounding box and estimate the distance between the camera and the object depending on the position of the foot point or reference point.
  • the foot point may for example correspond to a point on a lower side of the rectangle, in particular a center point of the lower side.
  • the computing unit is used to rotate the initial anchor box depending on the rotation parameter to determine the anchor box. Then, that is after rotating the initial anchor box, the computing unit is used to determine the bounding box by fitting the rotated initial anchor box to the object.
  • the anchor box used for determining the bounding box corresponds to the rotated initial anchor box.
  • Such implementations allow for the highest accuracy of determining the bounding box, since the trained algorithm may for example be trained based on the rotated initial anchor boxes.
  • a bounding box algorithm is trained, in particular by using a training computing unit, based on training images and on the rotated anchor boxes.
  • the computing unit is used to apply the trained bounding box algorithm to the image to determine the bounding box by fitting the rotated initial anchor box to the object.
  • the bounding box algorithm may for example comprise the neural network or the CNN.
  • the training is performed based on all rotated initial anchor boxes of all cells of the plurality of cells. As described above, such implementations allow for highest accuracy of determining the bounding box.
  • the computing unit is used to determine the initial bounding box by fitting the initial anchor box to the object. Then, that is after determining the initial bounding box, the computing unit is used to determine the bounding box by rotating the initial bounding box depending on the rotation parameter.
  • the initial anchor box corresponds to the anchor box used for determining the bounding box.
  • Such implementations may have the advantage that the method can be used as a post processing, in case the bounding box algorithm itself cannot or shall not be modified.
  • the bounding box algorithm is trained, in particular by the training computing unit, based on training images and on the initial anchor boxes.
  • the computing unit is used to apply the trained bounding box algorithm to the image to determine the initial bounding box by fitting the initial anchor box to the object.
  • the training is performed based on all initial anchor boxes of all cells of the plurality of cells.
  • the rotation parameter is then used only after the training is completed. Consequently, such implementations are suitable for post-processing of the bounding boxes in the images.
  • the computing unit is used to select an initial point on an image plane of the camera, wherein the initial point is associated to the selected cell.
  • the computing unit is used to generate a projection vector pointing from a projection center point of the camera to a projection point, wherein a mapping function of the camera maps the projection point to the initial point.
  • the computing unit is used to determine the rotation parameter depending on the projection vector.
  • the computing unit is used to store the rotation parameter to the storage medium.
  • the initial point on the image plane is given by two-dimensional coordinates on the image plane and is, in particular, independent of any information contained by the image.
  • the initial point only corresponds to a position and has no information content corresponding to the image.
  • the image plane corresponds to an active surface or a part of the active surface of an image sensor of the camera.
  • the initial point being associated to the cell can be understood such that the initial point has a well-defined position with respect to the cell, for example corresponds to a center point of the cell, a corner point to the cell or to a point with a defined relationship with respect to the center point or corner point.
  • the projection center point of the camera corresponds to a projection center point of a lens of the camera.
  • the projection center point may correspond to the center of the lens.
  • the mapping function maps the projection point to the initial point
  • the initial point is mapped to the projection point by the inverse mapping function.
  • the projection point can be determined by applying the inverse mapping function to the initial point.
  • the mapping function is, in particular, determined upfront, that is before the method according to the improved concept is carried out.
  • the mapping function may be determined during a calibration phase of the camera.
  • the mapping function may be saved or stored to the storage element.
  • the mapping function can also be considered as one or more intrinsic calibration parameter of the camera.
  • the projection vector In case of a rectilinear camera, the projection vector also points from the initial point to the projection center point. Flowever, for a non-rectilinear camera, this may not be the case.
  • the projection vector can be understood as a bound vector.
  • the computing unit is used to determine a reference vector depending on a pose of the camera and to construct a projection plane containing the reference vector and the projection vector and to determine the rotation parameter depending on the projection plane.
  • the pose of the camera is given by a position and an orientation of the camera.
  • the pose corresponds to the position and orientation of a sensor coordinate system of the camera with respect to a reference coordinate system, for example a reference coordinate system being rigidly connected to a vehicle, the camera may be mounted to.
  • the pose is therefore given by six parameters including three translational parameters defining the position of the camera, in particular the translational shift of the sensor coordinate system with respect to reference coordinate system in the three spatial dimensions.
  • the six parameters further comprises three angular parameters, which may for example be given by a roll angle, a pitch angle, and a yaw angle.
  • Roll angle, pitch angle, and yaw angle may be defined as rotation angles or Euler angles of the sensor coordinate system with respect to the reference coordinate system according to a predefined convention.
  • the convention may for example be that the sensor coordinate system results from the reference coordinate system due to the following three rotations. Therein, it is assumed that the sensor coordinate systems and the reference coordinate system are initially identical to each other. The sensor coordinate system is rotated around the z-axis of the reference coordinate system by the yaw angle. Then, the resulting sensor coordinate system is rotated around the resulting y-axis of the resulting sensor coordinate system by the pitch angle. Then, the resulting sensor coordinate system is rotated around the resulting x-axis of the resulting sensor coordinate system by the roll angle. Different conventions are possible as well.
  • the reference coordinate system may be rigidly fixed to a host object on which the camera is mounted, wherein the host object may be the vehicle.
  • the pose in particular the six parameters defining the pose, can also be considered as extrinsic calibration parameters.
  • the reference vector corresponds to the direction with respect to which the rotation parameter shall be defined.
  • the reference vector may correspond to a vertical direction in the real world, to a vertical axis in the reference coordinate system, or to a vertical axis of the sensor coordinate system
  • the reference vector is, in particular, a bound vector.
  • the reference vector may correspond to an axis of the reference coordinate system rotated according to the roll angle, the pitch angle and the yaw angle.
  • the reference vector has the same origin as the projection vector, namely the projection center point.
  • the projection plane therefore also contains the projection center point.
  • the computing unit is used to determine the reference vector depending on an orientation of the camera, in particular independent of a position of the camera.
  • the reference vector may be determined depending on the roll angle, the pitch angle and the yaw angle only.
  • the computing unit is used to map the projection plane onto a line in the image plane depending on the mapping function.
  • the computing unit is used to determine a tangent direction to the line at the initial point and to determine the rotation parameter depending on the tangent direction
  • the tangent direction corresponds to the rotation parameter in the sense that the tangent direction includes an angle with the reference vector corresponding to the rotation angle.
  • Mapping the projection plane depending on the mapping function corresponds to mapping each point on the projection plane by applying the mapping function to that point. Since the projection plane contains the projection center point, the plane is mapped onto a line in the image plane.
  • the line may be straight or curved, depending on the mapping function. In particular, the line may be straight for a rectilinear camera, while it may be curved for a non-rectilinear camera.
  • the initial point lies on the line.
  • the tangent to the initial point is used as a local reference direction, which may, in the respective implementations, correspond to a local vertical direction or horizontal direction of the selected cell.
  • the computing unit is used to determine a first auxiliary vector and a second auxiliary vector. Both auxiliary vectors have the projection center point as respective origins and lie within the projection plane. Both auxiliary vectors include the same predefined angle with the projection vector.
  • the computing unit is used to map respective end points of both auxiliary vectors onto respective mapped points on the image plane depending on the mapping function.
  • the computing unit is used to determine the rotation parameter depending on a straight line connecting the mapped points to each other.
  • the auxiliary vectors including the same angle with the projection vector can be understood such that the absolute values of the respective angles included with the projection vector are the same.
  • the straight line connecting the mapped point is, by construction, an approximation to the tangent direction to the line described above at the initial point. Consequently, the angle must be “small enough”. In other words, the error made by approximating the tangent direction by the straight line connecting the mapped points increases with increasing absolute value of the angle.
  • Such implementations may be used in case an exact expression or a closed parametric representation of the line corresponding to the mapped projection plane is not available or cannot be determined. Furthermore, such implementations may reduce the computational effort to determine to the rotation parameter.
  • the computing unit is used to determine the anchor box by rotating the initial anchor box depending on the rotation parameter.
  • the anchor box, and in particular also the initial anchor box has the shape of a rectangle, wherein a side of the rectangle is parallel to the tangent direction or to the approximate tangent direction given by the straight line connecting the mapped points.
  • the computing unit is used to determine the bounding box by rotating the initial bounding box depending on the rotation parameter.
  • the bounding box and in particular also the initial bounding box, has the shape of a rectangle, wherein a side of the rectangle is parallel to the tangent direction or to the approximate tangent direction given by the straight line connecting the mapped points.
  • the steps of retrieving the rotation parameter and determining the anchor box by rotating the predefined initial anchor box or determining the bounding box by rotating the initial bounding box as well as the step of determining the bounding box by fitting the anchor box are performed for each cell of the plurality of cells.
  • steps described for determining the rotation parameter may be performed for each of the cells of the plurality of cells.
  • a method for guiding a vehicle at least in part automatically is provided.
  • a camera of the vehicle is used to generate an image depicting an environment of the vehicle and the vehicle is guided at least in part automatically, in particular by using an electronic vehicle guidance system, depending on a bounding box for an object on the image.
  • a method for object detection according to the improved concept is carried out, in particular by the electronic vehicle guidance system.
  • the method is designed as a method for parking the vehicle at least in part automatically.
  • the improved concept is particularly suitable, since, as described above, too pessimistic distance estimations may be avoided.
  • an electronic vehicle guidance system comprising a computing unit.
  • the computing unit is configured to receive an image from a camera and to divide the image into a plurality of cells and select one cell of the plurality of cells.
  • the computing unit is configured to determine a bounding box for an object on the image by fitting an anchor box to the object, wherein at least a part of the object is located in the selected cell.
  • the computing unit is configured to retrieve a predefined rotation parameter from a storage medium, in particular from a storage medium of the computing unit or the camera or the electronical vehicle guidance system, wherein the rotation parameter depends on a location of the selected cell within the image.
  • the computing unit is configured to determine the anchor box by rotating a predefined initial anchor box depending on the rotation parameter or to determine the bounding box by rotating an initial bounding box depending on the rotation parameter.
  • An electronic vehicle guidance system may be understood as an electronic system, configured to guide a vehicle in a fully automated or a fully autonomous manner and, in particular, without a manual intervention or control by a driver or user of the vehicle being necessary. The vehicle conducts required steering maneuvers, braking maneuvers and/or acceleration maneuvers and so forth automatically.
  • the electronic vehicle guidance system may implement a fully automatic or fully autonomous driving mode according to level 5 of the SAE J3016 classification.
  • An electronic vehicle guidance system may also be implemented as an advanced driver assistance system, ADAS, assisting a driver for partially automatic or partially autonomous driving.
  • the electronic vehicle guidance system may implement a partly automatic or partly autonomous driving mode according to levels 1 to 4 of the SAE J3016 classification.
  • SAE J3016 refers to the respective standard dated June 2018.
  • Guiding the vehicle at least in part automatically may therefore comprise guiding the vehicle according to a fully automatic or fully autonomous driving mode according to level 5 of the SAE J3016 classification. Guiding the vehicle at least in part automatically may also comprise guiding the vehicle according to a partly automatic or partly autonomous driving mode according to levels 1 to 4 of the SAE J3016 classification.
  • the vehicle guidance system comprises the camera and the camera is designed as a non- rectilinear camera, in particular as a fisheye camera.
  • the electronic vehicle guidance system may be configured to or programmed to perform a method according to the improved concept or the electronic vehicle guidance system performs such a method.
  • a computer program comprising instructions.
  • the instructions when they are executed by a computer system, cause the computer system to carry out a method according to the improved concept.
  • the computer system may comprise one or more computing units, for example the computing unit of the electronic vehicle guidance system and optionally the training computing unit.
  • the instructions when they are executed by an electronic vehicle guidance system according to the improved concept, in particular by the computing unit of the vehicle guidance system, cause the vehicle guidance system to carry out a method according to the improved concept.
  • the computer program as well as the computer-readable storage medium according to the improved concept can be considered as respective computer program products comprising the instructions.
  • Fig. 1 shows schematically a vehicle with an exemplary implementation of an electronic vehicle guidance system according to the improved concept
  • Fig. 2 shows a flow diagram of an exemplary implementation of a method according to the improved concept
  • Fig. 3 shows a flow diagram of a further exemplary implementation of a method according to the improved concept
  • Fig. 4 shows a flow diagram of a further exemplary implementation of a method according to the improved concept.
  • a vehicle 1 comprising an electronic vehicle guidance system 2 according to the improved concept is shown.
  • the vehicle guidance system 2 comprises a computing unit 3, which may be part of or comprise an electronic computing unit, ECU, of the vehicle 1 .
  • the vehicle guidance system 2 further comprises a camera 4, for example a fisheye camera.
  • the vehicle guidance system 2 or the vehicle 1 further comprises a storage medium 9, which is coupled to the computing unit 3.
  • vehicle guidance system 2 The functionality of the vehicle guidance system 2 is explained in more detail in the following with respect to implementations of methods according to the improved concept and in particular with reference to Fig. 2 to Fig. 4.
  • Fig. 2 shows a flow diagram of an exemplary implementation of a method for object detection according to the improved concept.
  • step S1 the camera 4 generates an image 5 of an environment of the vehicle 1 and provides it to the computing unit 3.
  • the computing unit 3 divides the image 5 into a plurality of cells and selects one of them.
  • the computing unit 3 selects an initial point P on an image plane 10 of the camera 4, wherein the initial point P is associated to the selected cell.
  • the image plane 10 may for example correspond to the active surface of an image sensor of the camera 4 or part of the active surface.
  • the initial point P may correspond to a center of the selected cell, which may for example be a rectangular or quadratic cell.
  • the computing unit 3 generates a projection vector P pointing from a projection center point C of the camera 4 to a projection point.
  • the projection point corresponds to a point in the real world, which results from applying an inverse mapping function of the camera 4 to the initial point P.
  • the mapping function of the camera 4 maps the projection point to the initial point P.
  • the mapping function is a non-gnomonic function.
  • the computing unit 3 determines a reference vector V depending on a pose, in particular an orientation, of the camera 4 with respect to a vehicle coordinate system rigidly connected to the vehicle.
  • the reference vector V may correspond to a vertical axis of a sensor coordinate system rigidly connected to the camera, as depicted in Fig. 2.
  • the computing unit 3 constructs a projection plane containing the reference vector V and the projection vector P.
  • N P x V
  • x represents the vector product or cross product.
  • the projection plane can be considered as vertical, in case the reference vector V is interpreted as a global or overall vertical direction.
  • step S2 the projection plane is then projected back into the image plane 10 by the computing unit 3.
  • the projection plane is mapped onto a line 11 in the image plane 10 depending on the mapping function.
  • the mapping function is applied by the computing unit 3 to each of the points on the projection plane or on a correspondingly discretized projection plane, to map those points to the image plane 10.
  • the computing unit 3 determines a tangent direction 12 to the line 11 at the initial point P, which, by construction, lies on the line 11.
  • the computing unit 3 may be able to determine an exact or parametric representation of the line 11 and compute the tangent direction 12 based on this representation.
  • the computing unit 3 determines the rotation parameter for the selected cell depending on the tangent direction 12.
  • the angle the tangent direction includes with the reference vector V corresponds to an angle, which defines the rotation parameter.
  • the described steps may be repeated for all cells of the plurality of cells.
  • a bounding box algorithm may be trained based on training images and the rotated anchor boxes 8b for all cells.
  • the computing unit 3 may apply the trained bounding box algorithm to the image 5 to determine the bounding box 6b by fitting the rotated anchor box 8b to the object 7.
  • step S5 the computing unit may control the vehicle 1 at least in part automatically depending on the bounding box 6b.
  • Fig. 3 a flow diagram of a further exemplary implementation of a method for object detection according to the improved concept is shown.
  • the method according to Fig. 3 is based on the method according to Fig. 2. Therefore, only differences are explained.
  • the computing unit 3 determines, in addition to the projection vector P, a first auxiliary vector Pi and a second auxiliary vector P2.
  • the auxiliary vectors £1, P2 have the projection center point C as an origin and lie within the projection plane. They include the same predefined angle a with respect to the projection vector P.
  • the computing unit 3 maps the respective end points of the auxiliary vectors El, P2 onto respective mapped points P1 , P2 on the image plane 10 in step S2a.
  • the computing unit 3 determines a straight line connecting the mapped points P1 , P2 to each other and interprets this straight line as an approximation to the tangent direction 12.
  • the steps S2a and S2b may replace the respective method steps for determining the tangent direction 12 as explained with respect to Fig. 2.
  • Steps S3 to S5 are the same as described with respect to Fig. 2. Therefore they are not explicitly shown in Fig. 3.
  • Fig. 4 shows a flow diagram of a further exemplary implementation of a method for object detection according to the improved concept.
  • the steps S1 to S3 are identical to the steps S1 to S3 according to the method depicted with respect to Fig. 2 or to the steps S1 , S2a, S2b and S3 as described with respect to Fig. 3.
  • the computing unit 3 applies a trained bounding box algorithm to the image 5 to determine an initial bounding box 6a by fitting the initial, non-rotated, anchor box 8a to the object 7.
  • step S7 the computing unit 3 then determines the bounding box 6b by rotating the initial bounding box 6a depending on the rotation parameter.
  • object detection and in particular bounding box determination can be performed with an improved accuracy and reliability for various poses of the camera and for arbitrary mapping functions of the camera.
  • the complexity of a respective bounding box algorithm for example a CNN, may be kept low.
  • the rotation parameter may be for example obtained from intrinsic and/or extrinsic calibration information and may be used to predict rotated boxes for object detection. Based on the intrinsic and/or extrinsic calibration information parameters of the camera, the rotation angle of the center point of every cell may be computed as described. This may for example then be used to obtain a set of rotated anchors for every cell. The rotation angle may for example be computed once offline and saved as a look-up table in case of real-time constraints on computational resources. As described, the improved concept is beneficial for non-rectilinear cameras but may also be used to compensate for perspective effects, independent of the mapping function of the camera.
  • the improved concept can be realized as a post-processing.
  • an object detection CNN may be applied and the rotation angles may be computed according to the improved concept. Then the bounding box may be rotated afterwards.
  • the rotation angle does not need to be discretized. Since the rotation angle may be obtained from the calibration parameters themselves, the rotation angle does not have to be discretized upfront. This provides a very accurate representation of objects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

According to a method for object detection, a computing unit (3) is used to receive an image (5) from a camera (4) and to divide the image (5) into a plurality of cells and select one of the cells. The computing unit (3) is used to determine a bounding box (6b) for an object (7) on the image (5) by fitting an anchor box (8b) to the object (7), wherein at least a part of the object (7) is located in the selected cell. The computing unit (3) is used to retrieve a rotation parameter from a storage medium (9) and to determine the anchor box (8b) by rotating an initial anchor box or determine the bounding box (8b) by rotating an initial bounding box (8a).

Description

Object detection and guiding a vehicle
The present invention relates to a method for object detection, wherein a computing unit is used to receive an image from a camera and to divide the image into a plurality of cells and select one of the cells and to determine a bounding box for an object on the image by fitting an anchor box to the object, wherein at least a part of the object is located in the selected cell. The invention further relates to a corresponding method for guiding a vehicle at least in part automatically, to an electronical vehicle guidance system and to a computer program product.
Deep learning may be used for example for object detection, classification or segmentation. It may be used for various driver assistance applications or autonomous driving. For object detection and pedestrian detection, for example single shot detectors may be used.
Such detectors may divide an image into multiple smaller cells and within these cells they have a pre-defined set of anchors, which are used to fit the objects that lie within that cell. These anchors assume that the objects are vertical and standing on a flat ground plane.
A description of such detectors is given by M. Hollemans: One-stage object detection”, June 09, 2018, https://machinethink.net/blog/object-detection, retrieved on November 11 , 2019.
However, the assumption that the objects are vertical and standing on a flat ground plane is only reasonable in case there are neither distortion nor perspective effects in the image. Hence, said object detection algorithms work well only on rectilinear or similar projections, where the depicted objects are vertical and the ground plane is flat. The use of a non-rectilinear camera, however, may lead to curved images of objects that are straight lines and vertical or horizontal in the real world, such as the ground. Furthermore, depending on the position and orientation of the camera with respect to the environment of the vehicle, vertical or horizontal lines in the real world may appear tilted in the image. Therefore the existing approaches are limited in their accuracy, for example for determining a bounding box for an object on the image. This, in consequence, may lead to less reliable or more conservative object detection or distance estimation based on the non-optimal bounding box.
It is therefore an object of the present invention, to provide an improved concept for object detection, which improves the accuracy of the object detection, in particular of a bounding box determined for an object in the environment.
This object is solved by the respective subject matter of the independent claims. Further implementations and preferred embodiments are subject matter of the dependent claims.
The improved concept is based on the idea to replace a constant reference direction, such as vertical or horizontal direction, throughout the image by a locally varying or location dependent reference direction or, in particular, a locally depending vertical or horizontal direction. To this end, an anchor box is fitted to the object to determine a bounding box. Other than in known approaches, the anchor box or the bounding box are determined by rotating an initial anchor box or an initial bounding box, respectively, by a location-dependent angle or rotation parameter.
According to the improved concept, a method for object detection is provided. According to the method, a computing unit, in particular a computing unit of a vehicle, is used to receive an image from a camera, in particular a camera of the vehicle, and the computing unit is used to divide the image into a plurality of cells and to select one cell of the plurality of cells. The computing unit is used to determine a bounding box for an object on the image by fitting an anchor box to the object, wherein at least a part of the object is located in the selected cell. The computing unit is used to retrieve a predetermined rotation parameter from a storage medium, in particular from a storage medium of the computing unit or of the camera or of an electronic vehicle guidance system of the vehicle, wherein the rotation parameter, in particular the value of the rotation parameter, depends on a location of the selected cell within the image.
The computing unit is used to determine the anchor box by rotating a predefined initial anchor box depending on the rotation parameter. Alternatively, the computing unit is used to determine the bounding box by rotating an initial bounding box depending on the rotation parameter, wherein the initial bounding box is, in particular, determined by fitting the initial anchor box to the object. Dividing the image into the plurality of cells may be understood such that a predefined grid, for example a rectangular or quadratic grid, is superposed to the image.
The described steps of selecting a cell, determining the bounding box, retrieving the rotation parameter and determining the anchor box or bounding box by rotation of the initial anchor box or the initial bounding box, respectively, may for example be carried out for all of the grid cells, provided an object or a part of an object is present in the respective cell.
Consequently, the method steps described above relate to one specific cell of the plurality of cells, namely the selected cell, but can be carried out for all other cells as well to perform the object detection on the full image.
The anchor box may belong to a set of anchor boxes associated to the respective selected cell. Each anchor box of the set can be understood as a typical or potential bounding box for objects to be expected in the image. For example, the different anchor boxes of the set of anchor boxes of the selected cell may have different sizes and/or different shapes.
In particular the anchor box may be of rectangular or quadratic shape. However, this is not necessarily the case. It is sufficient that the anchor box has a well-defined orientation, which, in case of a rectangle, may be given by the direction of one of the sides of the rectangle. However, in case of other shapes, other well-defined orientations may be defined.
The initial anchor box may also be an initial anchor box of a set of initial anchor boxes associated to the selected cell. Therein, the respective sets of initial anchor boxes may be identical for all cells of the grid. This may not be the case for the various sets of anchor boxes of different cells, which may depend on the respective rotation parameter in some implementations.
Fitting the anchor box to the object may for example be understood as selecting the anchor box of the respective set of anchor boxes, which fits the object best. In other words, fitting the anchor box may correspond to selecting the optimal anchor box of the set of anchor boxes or to a minimization of an error for approximating the object by the respective anchor box. In particular, the rotation parameter corresponds to a rotation angle with respect to a pre defined reference direction defined globally for the whole image. In other words, the reference direction is the same for all cells. For example, in case the image has an overall rectangular shape, the reference direction may correspond to the vertical or horizontal direction according to one of the sides of the image. This is, in case of a rectangular grid, equivalent to choosing a grid direction as the reference direction.
Determining the bounding box for the object can be understood as an object detection task. The object detection may or may not comprise further tasks. In particular, the object detection may comprise determining further bounding boxes for further objects on the image, in particular in the same way as described for the bounding box and the selected cell.
In some implementations, the object detection may also comprise further tasks based on the bounding box or using the bounding box, such as object tracking or segmentation tasks.
Determining the bounding box by fitting the anchor box to the object may for example be carried out by applying a trained algorithm, in particular an algorithm based on machine learning and/or computer vision, to the image or to the individual cells.
The predefined rotational parameter may for example be provided to the algorithm. Alternatively, determining the rotation parameter may be part of the algorithm.
The trained algorithm may for example be based on a trained artificial neural network, for example a convolutional neural network, CNN.
As described, a method according to the improved concept effectively uses local anchor boxes or local bounding boxes having location-dependent orientations with respect to the reference direction. In this way, deviations from the assumption regarding vertical objects on flat ground in the image may be compensated, which results in a higher accuracy of the bounding box and the object detection.
In particular, the camera may not necessarily map horizontal or vertical lines in the real world to horizontal or vertical lines, respectively, in the image for various reasons. One reason may be that the camera is a non-rectilinear camera, such as a fisheye camera. Such cameras do in general not map straight lines in the real world to straight lines on the image. Therefore, the assumption stated above is intrinsically wrong in this case. Another reason may that, even for rectilinear cameras, vertical objects or horizontal objects in the real world do not necessarily map to vertical images or horizontal images of said object due to perspective effects. In other words, this may be the case if the object is not positioned on an optical axis of the camera. Both sources of inaccuracy may be compensated by the improved concept.
In consequence, more accurate bounding boxes and therefore a more reliable object detection may be achieved.
A non-rectilinear camera can be understood as a camera with a non-rectilinear lens or lens unit. A non-rectilinear lens or lens unit can be understood as a lens or lens unit, that is one or more lenses, having a non-rectilinear mapping function, also denoted as curvilinear mapping function. In particular, fisheye cameras represent non-rectilinear cameras.
The mapping function of the lens or lens unit can be understood as a function r(0) mapping an angle Q from the optical axis of the lens or lens unit to a radial shift r out of the image center. The function depends parametrically on the focal length f of the lens or lens unit.
For example, a rectilinear lens or lens unit has a gnomonic mapping function, in particular r(0) = f tan(0). In other words, a rectilinear lens or lens unit maps straight lines in the real world to straight lines in the image, at least up to lens imperfections.
A non-rectilinear or curvilinear lens or lens unit does, in general, not map straight lines to straight lines in the image. In particular, the mapping function of a non-rectilinear camera can be stereographic, equidistant, equisolid angle or orthographic. Other examples of mapping functions of non-rectilinear lens units are polynomial functions.
Bounding boxes may for example be used for determining the distance between the camera and the object. In case of an incorrect or inaccurate bounding box, the distance may be estimated smaller than it actually is. This leads to pessimistic distance estimations, for example in the case of automotive applications. The improved concept may therefore also lead to less pessimistic distance estimations. This may be particularly beneficial in the context of partly or fully autonomous driving or parking applications. According to several implementations of the method according to the improved concept, the camera is used to generate the image. According to several implementations, the computing unit is used to estimate a distance between the object and the camera depending on the bounding box.
In particular, the computing unit may be used to determine the position of a foot point or reference point of the bounding box and estimate the distance between the camera and the object depending on the position of the foot point or reference point. In case of a rectangular bounding box, the foot point may for example correspond to a point on a lower side of the rectangle, in particular a center point of the lower side.
According to several implementations, the computing unit is used to rotate the initial anchor box depending on the rotation parameter to determine the anchor box. Then, that is after rotating the initial anchor box, the computing unit is used to determine the bounding box by fitting the rotated initial anchor box to the object.
In such implementations, the anchor box used for determining the bounding box corresponds to the rotated initial anchor box.
In such implementations, only one bounding box per cell is involved, namely the bounding box determined based on the rotated initial anchor box.
Such implementations allow for the highest accuracy of determining the bounding box, since the trained algorithm may for example be trained based on the rotated initial anchor boxes.
According to several implementations, a bounding box algorithm is trained, in particular by using a training computing unit, based on training images and on the rotated anchor boxes. The computing unit is used to apply the trained bounding box algorithm to the image to determine the bounding box by fitting the rotated initial anchor box to the object.
The bounding box algorithm may for example comprise the neural network or the CNN.
In particular, the training is performed based on all rotated initial anchor boxes of all cells of the plurality of cells. As described above, such implementations allow for highest accuracy of determining the bounding box.
According to several implementations, the computing unit is used to determine the initial bounding box by fitting the initial anchor box to the object. Then, that is after determining the initial bounding box, the computing unit is used to determine the bounding box by rotating the initial bounding box depending on the rotation parameter.
In such implementations, the initial anchor box corresponds to the anchor box used for determining the bounding box.
Such implementations may have the advantage that the method can be used as a post processing, in case the bounding box algorithm itself cannot or shall not be modified.
According to several implementations, the bounding box algorithm is trained, in particular by the training computing unit, based on training images and on the initial anchor boxes. The computing unit is used to apply the trained bounding box algorithm to the image to determine the initial bounding box by fitting the initial anchor box to the object.
In particular, the training is performed based on all initial anchor boxes of all cells of the plurality of cells.
The rotation parameter is then used only after the training is completed. Consequently, such implementations are suitable for post-processing of the bounding boxes in the images.
According to several implementations, the computing unit is used to select an initial point on an image plane of the camera, wherein the initial point is associated to the selected cell. The computing unit is used to generate a projection vector pointing from a projection center point of the camera to a projection point, wherein a mapping function of the camera maps the projection point to the initial point. The computing unit is used to determine the rotation parameter depending on the projection vector.
In particular, the computing unit is used to store the rotation parameter to the storage medium. The initial point on the image plane is given by two-dimensional coordinates on the image plane and is, in particular, independent of any information contained by the image. The initial point only corresponds to a position and has no information content corresponding to the image.
For example, the image plane corresponds to an active surface or a part of the active surface of an image sensor of the camera.
The initial point being associated to the cell can be understood such that the initial point has a well-defined position with respect to the cell, for example corresponds to a center point of the cell, a corner point to the cell or to a point with a defined relationship with respect to the center point or corner point.
The projection center point of the camera corresponds to a projection center point of a lens of the camera. For example, the projection center point may correspond to the center of the lens.
Since the mapping function maps the projection point to the initial point, the initial point is mapped to the projection point by the inverse mapping function. In other words, in case the initial point is known, the projection point can be determined by applying the inverse mapping function to the initial point.
The mapping function is, in particular, determined upfront, that is before the method according to the improved concept is carried out. The mapping function may be determined during a calibration phase of the camera. The mapping function may be saved or stored to the storage element. The mapping function can also be considered as one or more intrinsic calibration parameter of the camera.
In case of a rectilinear camera, the projection vector also points from the initial point to the projection center point. Flowever, for a non-rectilinear camera, this may not be the case.
In particular, the projection vector can be understood as a bound vector.
According to several implementations, the computing unit is used to determine a reference vector depending on a pose of the camera and to construct a projection plane containing the reference vector and the projection vector and to determine the rotation parameter depending on the projection plane.
The pose of the camera is given by a position and an orientation of the camera. In particular, the pose corresponds to the position and orientation of a sensor coordinate system of the camera with respect to a reference coordinate system, for example a reference coordinate system being rigidly connected to a vehicle, the camera may be mounted to.
The pose is therefore given by six parameters including three translational parameters defining the position of the camera, in particular the translational shift of the sensor coordinate system with respect to reference coordinate system in the three spatial dimensions. The six parameters further comprises three angular parameters, which may for example be given by a roll angle, a pitch angle, and a yaw angle.
Roll angle, pitch angle, and yaw angle may be defined as rotation angles or Euler angles of the sensor coordinate system with respect to the reference coordinate system according to a predefined convention. The convention may for example be that the sensor coordinate system results from the reference coordinate system due to the following three rotations. Therein, it is assumed that the sensor coordinate systems and the reference coordinate system are initially identical to each other. The sensor coordinate system is rotated around the z-axis of the reference coordinate system by the yaw angle. Then, the resulting sensor coordinate system is rotated around the resulting y-axis of the resulting sensor coordinate system by the pitch angle. Then, the resulting sensor coordinate system is rotated around the resulting x-axis of the resulting sensor coordinate system by the roll angle. Different conventions are possible as well.
The reference coordinate system may be rigidly fixed to a host object on which the camera is mounted, wherein the host object may be the vehicle.
The pose, in particular the six parameters defining the pose, can also be considered as extrinsic calibration parameters.
The reference vector corresponds to the direction with respect to which the rotation parameter shall be defined. For example, the reference vector may correspond to a vertical direction in the real world, to a vertical axis in the reference coordinate system, or to a vertical axis of the sensor coordinate system The reference vector is, in particular, a bound vector.
In particular, the reference vector may correspond to an axis of the reference coordinate system rotated according to the roll angle, the pitch angle and the yaw angle.
The reference vector has the same origin as the projection vector, namely the projection center point. The projection plane therefore also contains the projection center point. By the respective implementations, the intrinsic calibration information as well as the extrinsic calibration information of the camera is taken into account to achieve a particularly high accuracy.
According to several implementations, the computing unit is used to determine the reference vector depending on an orientation of the camera, in particular independent of a position of the camera.
In other words, the reference vector may be determined depending on the roll angle, the pitch angle and the yaw angle only.
According to several implementations, the computing unit is used to map the projection plane onto a line in the image plane depending on the mapping function. The computing unit is used to determine a tangent direction to the line at the initial point and to determine the rotation parameter depending on the tangent direction
In other words, the tangent direction corresponds to the rotation parameter in the sense that the tangent direction includes an angle with the reference vector corresponding to the rotation angle. Mapping the projection plane depending on the mapping function corresponds to mapping each point on the projection plane by applying the mapping function to that point. Since the projection plane contains the projection center point, the plane is mapped onto a line in the image plane. The line may be straight or curved, depending on the mapping function. In particular, the line may be straight for a rectilinear camera, while it may be curved for a non-rectilinear camera. By construction, also the initial point lies on the line. In such implementations, the tangent to the initial point is used as a local reference direction, which may, in the respective implementations, correspond to a local vertical direction or horizontal direction of the selected cell.
According to several implementations, the computing unit is used to determine a first auxiliary vector and a second auxiliary vector. Both auxiliary vectors have the projection center point as respective origins and lie within the projection plane. Both auxiliary vectors include the same predefined angle with the projection vector. The computing unit is used to map respective end points of both auxiliary vectors onto respective mapped points on the image plane depending on the mapping function. The computing unit is used to determine the rotation parameter depending on a straight line connecting the mapped points to each other. The auxiliary vectors including the same angle with the projection vector can be understood such that the absolute values of the respective angles included with the projection vector are the same.
The straight line connecting the mapped point is, by construction, an approximation to the tangent direction to the line described above at the initial point. Consequently, the angle must be “small enough”. In other words, the error made by approximating the tangent direction by the straight line connecting the mapped points increases with increasing absolute value of the angle. Such implementations may be used in case an exact expression or a closed parametric representation of the line corresponding to the mapped projection plane is not available or cannot be determined. Furthermore, such implementations may reduce the computational effort to determine to the rotation parameter. According to several implementations, the computing unit is used to determine the anchor box by rotating the initial anchor box depending on the rotation parameter. The anchor box, and in particular also the initial anchor box, has the shape of a rectangle, wherein a side of the rectangle is parallel to the tangent direction or to the approximate tangent direction given by the straight line connecting the mapped points.
According to several implementations, the computing unit is used to determine the bounding box by rotating the initial bounding box depending on the rotation parameter. The bounding box, and in particular also the initial bounding box, has the shape of a rectangle, wherein a side of the rectangle is parallel to the tangent direction or to the approximate tangent direction given by the straight line connecting the mapped points.
According to several implementations, the steps of retrieving the rotation parameter and determining the anchor box by rotating the predefined initial anchor box or determining the bounding box by rotating the initial bounding box as well as the step of determining the bounding box by fitting the anchor box are performed for each cell of the plurality of cells.
Consequently, steps described for determining the rotation parameter may be performed for each of the cells of the plurality of cells.
According to the improved concept, also a method for guiding a vehicle at least in part automatically is provided. A camera of the vehicle is used to generate an image depicting an environment of the vehicle and the vehicle is guided at least in part automatically, in particular by using an electronic vehicle guidance system, depending on a bounding box for an object on the image. In order to determine the bounding box, a method for object detection according to the improved concept is carried out, in particular by the electronic vehicle guidance system.
According to several implementations of the method for guiding a vehicle, the method is designed as a method for parking the vehicle at least in part automatically.
For parking applications, the improved concept is particularly suitable, since, as described above, too pessimistic distance estimations may be avoided.
According to the improved concept, also an electronic vehicle guidance system comprising a computing unit is provided. The computing unit is configured to receive an image from a camera and to divide the image into a plurality of cells and select one cell of the plurality of cells. The computing unit is configured to determine a bounding box for an object on the image by fitting an anchor box to the object, wherein at least a part of the object is located in the selected cell. The computing unit is configured to retrieve a predefined rotation parameter from a storage medium, in particular from a storage medium of the computing unit or the camera or the electronical vehicle guidance system, wherein the rotation parameter depends on a location of the selected cell within the image. The computing unit is configured to determine the anchor box by rotating a predefined initial anchor box depending on the rotation parameter or to determine the bounding box by rotating an initial bounding box depending on the rotation parameter.
An electronic vehicle guidance system may be understood as an electronic system, configured to guide a vehicle in a fully automated or a fully autonomous manner and, in particular, without a manual intervention or control by a driver or user of the vehicle being necessary. The vehicle conducts required steering maneuvers, braking maneuvers and/or acceleration maneuvers and so forth automatically. In particular, the electronic vehicle guidance system may implement a fully automatic or fully autonomous driving mode according to level 5 of the SAE J3016 classification. An electronic vehicle guidance system may also be implemented as an advanced driver assistance system, ADAS, assisting a driver for partially automatic or partially autonomous driving. In particular, the electronic vehicle guidance system may implement a partly automatic or partly autonomous driving mode according to levels 1 to 4 of the SAE J3016 classification. Here and in the following, SAE J3016 refers to the respective standard dated June 2018.
Guiding the vehicle at least in part automatically may therefore comprise guiding the vehicle according to a fully automatic or fully autonomous driving mode according to level 5 of the SAE J3016 classification. Guiding the vehicle at least in part automatically may also comprise guiding the vehicle according to a partly automatic or partly autonomous driving mode according to levels 1 to 4 of the SAE J3016 classification.
According to several implementations of the electronical vehicle guidance system, the vehicle guidance system comprises the camera and the camera is designed as a non- rectilinear camera, in particular as a fisheye camera.
Further implementations of the electronic vehicle guidance system follow directly from the various implementations of the method for object detection according to the improved concept and from the various implementations of the method for guiding a vehicle according to the improved concept and vice versa respectively. In particular, the electronic vehicle guidance system may be configured to or programmed to perform a method according to the improved concept or the electronic vehicle guidance system performs such a method.
According to the improved concept, also a computer program comprising instructions is provided. According to several implementations of the computer program, the instructions, when they are executed by a computer system, cause the computer system to carry out a method according to the improved concept.
The computer system may comprise one or more computing units, for example the computing unit of the electronic vehicle guidance system and optionally the training computing unit.
According to several implementations of the computer program, the instructions, when they are executed by an electronic vehicle guidance system according to the improved concept, in particular by the computing unit of the vehicle guidance system, cause the vehicle guidance system to carry out a method according to the improved concept.
According to the improved concept, also a computer-readable storage medium storing a computer program according to the improved concept is provided.
The computer program as well as the computer-readable storage medium according to the improved concept can be considered as respective computer program products comprising the instructions.
Further features of the invention are apparent from the claims, the figures and the description of figures. The features and feature combinations mentioned above in the description as well as the features and feature combinations mentioned below in the description of figures and/or shown in the figures alone are usable not only in the respectively specified combination, but also in other combinations without departing from the scope of the invention. Thus, implementations are also to be considered as encompassed and disclosed by the invention, which are not explicitly shown in the figures and explained, but arise from and can be generated by separated feature combinations from the explained implementations. Implementations and feature combinations are also to be considered as disclosed, which do not have all of the features of an originally formulated independent claim. Moreover, implementations and feature combinations are to be considered as disclosed, in particular by the implementations set out above, which extend beyond or deviate from the feature combinations set out in the relations of the claims.
In the Figures, Fig. 1 shows schematically a vehicle with an exemplary implementation of an electronic vehicle guidance system according to the improved concept;
Fig. 2 shows a flow diagram of an exemplary implementation of a method according to the improved concept;
Fig. 3 shows a flow diagram of a further exemplary implementation of a method according to the improved concept; and Fig. 4 shows a flow diagram of a further exemplary implementation of a method according to the improved concept.
In Fig. 1 , a vehicle 1 comprising an electronic vehicle guidance system 2 according to the improved concept is shown.
The vehicle guidance system 2 comprises a computing unit 3, which may be part of or comprise an electronic computing unit, ECU, of the vehicle 1 . The vehicle guidance system 2 further comprises a camera 4, for example a fisheye camera. The vehicle guidance system 2 or the vehicle 1 further comprises a storage medium 9, which is coupled to the computing unit 3.
The functionality of the vehicle guidance system 2 is explained in more detail in the following with respect to implementations of methods according to the improved concept and in particular with reference to Fig. 2 to Fig. 4.
Fig. 2 shows a flow diagram of an exemplary implementation of a method for object detection according to the improved concept.
In step S1 , the camera 4 generates an image 5 of an environment of the vehicle 1 and provides it to the computing unit 3. The computing unit 3 divides the image 5 into a plurality of cells and selects one of them.
The computing unit 3 selects an initial point P on an image plane 10 of the camera 4, wherein the initial point P is associated to the selected cell. The image plane 10 may for example correspond to the active surface of an image sensor of the camera 4 or part of the active surface. For example, the initial point P may correspond to a center of the selected cell, which may for example be a rectangular or quadratic cell. The computing unit 3 generates a projection vector P pointing from a projection center point C of the camera 4 to a projection point. The projection point corresponds to a point in the real world, which results from applying an inverse mapping function of the camera 4 to the initial point P. In other words, the mapping function of the camera 4 maps the projection point to the initial point P. In case the camera 4 is a non-rectilinear camera, such as a fisheye camera, the mapping function is a non-gnomonic function.
The computing unit 3 determines a reference vector V depending on a pose, in particular an orientation, of the camera 4 with respect to a vehicle coordinate system rigidly connected to the vehicle.
In particular, the reference vector V may correspond to a vertical axis of a sensor coordinate system rigidly connected to the camera, as depicted in Fig. 2. The computing unit 3 constructs a projection plane containing the reference vector V and the projection vector P. For example, the projection plane may be defined by its normal vector N given by N = P x V, wherein "x" represents the vector product or cross product. By construction, the projection center point C lies on the projection plane.
Furthermore, the projection plane can be considered as vertical, in case the reference vector V is interpreted as a global or overall vertical direction.
In step S2, the projection plane is then projected back into the image plane 10 by the computing unit 3. To this end, the projection plane is mapped onto a line 11 in the image plane 10 depending on the mapping function. In other words, the mapping function is applied by the computing unit 3 to each of the points on the projection plane or on a correspondingly discretized projection plane, to map those points to the image plane 10.
Then, the computing unit 3 determines a tangent direction 12 to the line 11 at the initial point P, which, by construction, lies on the line 11. In particular, depending on the mapping function, the computing unit 3 may be able to determine an exact or parametric representation of the line 11 and compute the tangent direction 12 based on this representation. In step S3, the computing unit 3 determines the rotation parameter for the selected cell depending on the tangent direction 12. In other words, the angle the tangent direction includes with the reference vector V corresponds to an angle, which defines the rotation parameter.
An initial anchor box 8a corresponding to a rectangle oriented according to the vertical reference vector Vjs rotated by the rotation parameter to obtain a rotated anchor box 8b. The described steps may be repeated for all cells of the plurality of cells.
A bounding box algorithm may be trained based on training images and the rotated anchor boxes 8b for all cells. In step S4, the computing unit 3 may apply the trained bounding box algorithm to the image 5 to determine the bounding box 6b by fitting the rotated anchor box 8b to the object 7.
In step S5, the computing unit may control the vehicle 1 at least in part automatically depending on the bounding box 6b.
In Fig. 3 a flow diagram of a further exemplary implementation of a method for object detection according to the improved concept is shown. The method according to Fig. 3 is based on the method according to Fig. 2. Therefore, only differences are explained.
According to Fig. 3, the computing unit 3 determines, in addition to the projection vector P, a first auxiliary vector Pi and a second auxiliary vector P2. The auxiliary vectors £1, P2 have the projection center point C as an origin and lie within the projection plane. They include the same predefined angle a with respect to the projection vector P.
Furthermore, the computing unit 3 maps the respective end points of the auxiliary vectors El, P2 onto respective mapped points P1 , P2 on the image plane 10 in step S2a. In step S2b, the computing unit 3 determines a straight line connecting the mapped points P1 , P2 to each other and interprets this straight line as an approximation to the tangent direction 12. The steps S2a and S2b may replace the respective method steps for determining the tangent direction 12 as explained with respect to Fig. 2.
Steps S3 to S5 are the same as described with respect to Fig. 2. Therefore they are not explicitly shown in Fig. 3.
Fig. 4 shows a flow diagram of a further exemplary implementation of a method for object detection according to the improved concept.
The steps S1 to S3 are identical to the steps S1 to S3 according to the method depicted with respect to Fig. 2 or to the steps S1 , S2a, S2b and S3 as described with respect to Fig. 3.
Flowever, according to the method of Fig. 4, the computing unit 3 applies a trained bounding box algorithm to the image 5 to determine an initial bounding box 6a by fitting the initial, non-rotated, anchor box 8a to the object 7.
In step S7, the computing unit 3 then determines the bounding box 6b by rotating the initial bounding box 6a depending on the rotation parameter.
In this way, the proposed approach can be applied as a post processing step, if necessary.
As described, according to the improved concept, object detection and in particular bounding box determination can be performed with an improved accuracy and reliability for various poses of the camera and for arbitrary mapping functions of the camera.
By means of the improved concept, the complexity of a respective bounding box algorithm, for example a CNN, may be kept low.
The rotation parameter may be for example obtained from intrinsic and/or extrinsic calibration information and may be used to predict rotated boxes for object detection. Based on the intrinsic and/or extrinsic calibration information parameters of the camera, the rotation angle of the center point of every cell may be computed as described. This may for example then be used to obtain a set of rotated anchors for every cell. The rotation angle may for example be computed once offline and saved as a look-up table in case of real-time constraints on computational resources. As described, the improved concept is beneficial for non-rectilinear cameras but may also be used to compensate for perspective effects, independent of the mapping function of the camera.
In some implementations, the improved concept can be realized as a post-processing. In such a scenario, an object detection CNN may be applied and the rotation angles may be computed according to the improved concept. Then the bounding box may be rotated afterwards.
According to the improved concept, the rotation angle does not need to be discretized. Since the rotation angle may be obtained from the calibration parameters themselves, the rotation angle does not have to be discretized upfront. This provides a very accurate representation of objects.

Claims

Claims
1. Method for object detection, wherein a computing unit (3) is used to receive an image (5) from a camera (4) and to divide the image (5) into a plurality of cells and select one of the cells; and - to determine a bounding box (6b) for an object (7) on the image (5) by fitting an anchor box (8b) to the object (7), wherein at least a part of the object (7) is located in the selected cell; characterized in that the computing unit (3) is used to - retrieve a rotation parameter from a storage medium (9), wherein the rotation parameter depends on a location of the selected cell within the image (5); and determine the anchor box (8b) by rotating a predefined initial anchor box (6a) depending on the rotation parameter or determine the bounding box (8b) by rotating an initial bounding box (8a) depending on the rotation parameter the computing unit (3) being further used to select an initial point (P), which is associated to the selected cell, on an image plane (10) of the camera (4); generate a projection vector (P) pointing from a projection center point (C) of the camera (4) to a projection point, wherein a mapping function of the camera (4) maps the projection point to the initial point (P); and determine the rotation parameter depending on the projection vector (P).
2. Method according to claim 1 , characterized in that the computing unit (4) is used to rotate the initial anchor box (8a) depending on the rotation parameter; and then determine the bounding box (8b) by fitting the rotated initial anchor box (8b) to the object (7).
3. Method according to claim 2, characterized in that a bounding box algorithm is trained based on training images and based on the rotated anchor boxes (8b); and the computing unit (3) is used to apply the trained bounding box algorithm to the image (5) to determine the bounding box (6b) by fitting the rotated initial anchor box (8b) to the object (7).
4. Method according to claim 1 , characterized in that the computing unit (3) is used to determine the initial bounding box (6a) by fitting the initial anchor box (8a) to the object (7); and then determine the bounding box (6b) by rotating the initial bounding box (6a) depending on the rotation parameter.
5. Method according to claim 4, characterized in that a bounding box algorithm is trained based on training images and the initial anchor boxes (6a); and - the computing unit (3) is used to apply the trained bounding box algorithm to the image (5) to determine the initial bounding box (6a) by fitting the initial anchor box (8a) to the object (7).
6. Method according to claim 5, characterized in that the computing unit (3) is used to determine a reference vector (V) depending on a pose of the camera (4); construct a projection plane containing the reference vector (V) and the projection vector (P); and - determine the rotation parameter depending on the projection plane.
7. Method according to claim 6 characterized in that the computing unit (3) is used to map the projection plane onto a line (11 ) in the image plane (10) depending on the mapping function; determine a tangent direction (12) to the line (11 ) at the initial point (P); and determine the rotation parameter depending on the tangent direction (12).
8. Method according to claim 6, characterized in that the computing unit (3) is used to determine a first auxiliary vector (P1) and a second auxiliary vector (P2), both auxiliary vectors (P1_, P2) having the projection center point (C) as an origin and lie within the projection plane and include the same predefined angle (a) with the projection vector (P); map respective end points of both auxiliary vectors (£1, P2) onto respective mapped points (P1 , P2) on the image plane (10) depending on the mapping function; determine the rotation parameter depending on a straight line connecting the mapped points (P1 , P2) to each other.
9. Method according to claim 7, characterized in that the computing unit is used to determine the anchor box (8b) by rotating the initial anchor box (8a) depending on the rotation parameter; and the anchor box (8b) has the shape of a rectangle, wherein a side of the rectangle is parallel to the tangent direction (12).
10. Method according to claim 7, characterized in that the computing unit (3) is used to determine the bounding box (6b) by rotating the initial bounding box (6a) depending on the rotation parameter; and - the bounding box (6b) has the shape of a rectangle, wherein a side of the rectangle is parallel to the tangent direction (12).
11. Method for guiding a vehicle at least in part automatically, wherein a camera (4) of the vehicle (1) is used to generate an image (5) depicting an environment of the vehicle (1); and the vehicle (1) is guided at least in part automatically depending on a bounding box (6b) for an object (7) on the image (5); characterized in that a method for object detection according to one of claims 1 to 11 is carried out to determine the bounding box (6b).
12. Electronic vehicle guidance system comprising a computing unit (3), which is configured to receive an image (5) from a camera (4) and to divide the image (5) into a plurality of cells and select one of the cells; and to determine a bounding box (6b) for an object (7) on the image (5) by fitting an anchor box (8b) to the object (7), wherein at least a part of the object (7) is located in the selected cell; characterized in that the computing unit (3) is configured to retrieve a rotation parameter from a storage medium (9), wherein the rotation parameter depends on a location of the selected cell within the image (5); and determine the anchor box (8b) by rotating a predefined initial anchor box (8a) depending on the rotation parameter or determine the bounding box (6b) by rotating an initial bounding box (6a) depending on the rotation parameter the computing unit (3) being further configured to select an initial point (P), which is associated to the selected cell, on an image plane (10) of the camera (4); generate a projection vector (P) pointing from a projection center point (C) of the camera (4) to a projection point, wherein a mapping function of the camera (4) maps the projection point to the initial point (P); and determine the rotation parameter depending on the projection vector (P).
13. Electronic vehicle guidance system according to claim 12, characterized in that the vehicle guidance system (2) comprises the camera (4) and the camera (4) is designed as a non-rectilinear camera, in particular as a fisheye camera.
14. Computer program product comprising instructions, which, when executed by a computer system, cause the computer system to carry out a method according to one of claims 1 to 10; and/or when executed by an electronic vehicle guidance system (2) according to one of claims 12 or 13, cause the vehicle guidance system (2) to carry out a method according to one of claims 1 to 11 .
PCT/EP2021/056632 2020-03-18 2021-03-16 Object detection and guiding a vehicle WO2021185812A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102020107383.6A DE102020107383A1 (en) 2020-03-18 2020-03-18 Object recognition and driving a vehicle
DE102020107383.6 2020-03-18

Publications (1)

Publication Number Publication Date
WO2021185812A1 true WO2021185812A1 (en) 2021-09-23

Family

ID=75111574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/056632 WO2021185812A1 (en) 2020-03-18 2021-03-16 Object detection and guiding a vehicle

Country Status (2)

Country Link
DE (1) DE102020107383A1 (en)
WO (1) WO2021185812A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102022119751A1 (en) 2022-08-05 2024-02-08 Connaught Electronics Ltd. Determining an area of interest from camera images

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130215A1 (en) * 2016-04-21 2019-05-02 Osram Gmbh Training method and detection method for object recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130215A1 (en) * 2016-04-21 2019-05-02 Osram Gmbh Training method and detection method for object recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI SHENGYE ET AL: "Supervised People Counting Using An Overhead Fisheye Camera", 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), IEEE, 18 September 2019 (2019-09-18), pages 1 - 8, XP033642125, DOI: 10.1109/AVSS.2019.8909877 *
M. HOLLEMANS, ONE-STAGE OBJECT DETECTION, 9 June 2018 (2018-06-09), Retrieved from the Internet <URL:https://machinethink.net/blog/object-detection>
TAMURA MASATO ET AL: "Omnidirectional Pedestrian Detection by Rotation Invariant Training", 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), IEEE, 7 January 2019 (2019-01-07), pages 1989 - 1998, XP033525714, DOI: 10.1109/WACV.2019.00216 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge
CN116862980B (en) * 2023-06-12 2024-01-23 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge

Also Published As

Publication number Publication date
DE102020107383A1 (en) 2021-09-23

Similar Documents

Publication Publication Date Title
CN111337947B (en) Instant mapping and positioning method, device, system and storage medium
US11237572B2 (en) Collision avoidance system, depth imaging system, vehicle, map generator and methods thereof
CN110057352B (en) Camera attitude angle determination method and device
US11417017B2 (en) Camera-only-localization in sparse 3D mapped environments
Zhang et al. Localization and navigation using QR code for mobile robot in indoor environment
CN109887033A (en) Localization method and device
US11887336B2 (en) Method for estimating a relative position of an object in the surroundings of a vehicle and electronic control unit for a vehicle and vehicle
EP3367061B1 (en) Navigation system based on slow feature gradients
JP6165745B2 (en) Calibration method for on-board computer-based vision system
US20190217889A1 (en) Driving assistance method, driving assistance system and vehicle
WO2021185812A1 (en) Object detection and guiding a vehicle
CN109903346B (en) Camera attitude detecting method, device, equipment and storage medium
EP2887315B1 (en) Camera calibration device, method for implementing calibration, program and camera for movable body
WO2013025730A1 (en) Systems and methods for navigating camera
JP6229041B2 (en) Method for estimating the angular deviation of a moving element relative to a reference direction
CN109727285B (en) Position and pose determination method and system using edge images
US11652972B2 (en) Systems and methods for self-supervised depth estimation according to an arbitrary camera
WO2021110497A1 (en) Estimating a three-dimensional position of an object
CN112068152A (en) Method and system for simultaneous 2D localization and 2D map creation using a 3D scanner
Han et al. Robust ego-motion estimation and map matching technique for autonomous vehicle localization with high definition digital map
WO2016131587A1 (en) Method and device for stabilization of a surround view image
US11865724B2 (en) Movement control method, mobile machine and non-transitory computer readable storage medium
CN114419118A (en) Three-dimensional point cloud registration method, mobile device and storage medium
CN114074666A (en) Sensor fusion
CN116612459B (en) Target detection method, target detection device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21713346

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21713346

Country of ref document: EP

Kind code of ref document: A1