CN117557616B

CN117557616B - Method, device and equipment for determining pitch angle and estimating depth of monocular camera

Info

Publication number: CN117557616B
Application number: CN202410038590.0A
Authority: CN
Inventors: 王贝贝; 张燕咏; 吉建民
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2024-01-11
Filing date: 2024-01-11
Publication date: 2024-04-02
Anticipated expiration: 2044-01-11
Also published as: CN117557616A

Abstract

The disclosure belongs to the technical field of computer vision, and particularly relates to a method, a device and equipment for determining a pitching angle and estimating depth of a monocular camera. The method for determining the pitch angle between the monocular camera and the pavement to-be-detected area comprises the following steps: the monocular camera is designed based on a pinhole imaging principle, is deployed on a road side, and has a roll angle with the region to be detected not greater than a preset value; acquiring pictures of the to-be-detected region at different moments in the same visual angle based on the monocular camera, acquiring track points of the to-be-detected region moving object based on the pictures, and determining a feature extraction region in the to-be-detected region based on the track points; and detecting vanishing points in the feature extraction area, and determining pitch angles of the monocular camera and the area to be detected based on the positions of the vanishing points. The monocular camera depth estimation scheme is low in time delay and easy to deploy.

Description

Method, device and equipment for determining pitch angle and estimating depth of monocular camera

Technical Field

The disclosure belongs to the technical field of computer vision, and particularly relates to a method, a device and equipment for determining a pitching angle and estimating depth of a monocular camera.

Background

When the general high-definition camera sensor collects pixel data, 3D space information is compressed to a 2D plane, so that distance and depth information of a target relative to the sensor is lost. Therefore, in an application scenario requiring depth information, such as an actual traffic regulatory application, depth information of a target needs to be recovered.

Currently, for a monocular camera, a mode of acquiring high-precision depth information of a target mainly adopts a deep learning method to directly predict the position and orientation of an object under a three-dimensional camera coordinate system. However, the deep neural network structure of the algorithm is generally complex, so that the model has high detection accuracy but large reasoning calculation amount and high time delay, is unfavorable for capturing target information in real time, and has higher requirements on a hardware platform by a 3D target perception model compared with a 2D target detection algorithm of a pixel plane, so that deployment cost is higher; the generalization of the 3D perception model is generally poor, a large amount of annotation data is often required to be reused for model training when equipment or an environmental field is replaced, the model is difficult to copy, and algorithm migration and practical deployment application are not facilitated.

Disclosure of Invention

The embodiment of the disclosure provides a monocular camera depth estimation scheme to solve the problems of high time delay and difficult deployment of the existing monocular camera depth estimation scheme.

A first aspect of the disclosed embodiments provides a method for determining a pitch angle between a monocular camera and a region to be detected on a road, where the monocular camera is designed based on a pinhole imaging principle, is deployed on a road side, and a roll angle between the monocular camera and the region to be detected is not greater than a preset value, including:

acquiring pictures of the to-be-detected region at different moments in the same visual angle based on the monocular camera, acquiring track points of the to-be-detected region moving object based on the pictures, and determining a feature extraction region in the to-be-detected region based on the track points;

and detecting vanishing points in the feature extraction area, and determining pitch angles of the monocular camera and the area to be detected based on the positions of the vanishing points.

In some embodiments, the acquiring the trajectory point of the moving object in the region to be detected based on the picture includes:

2D target detection is carried out on the picture based on a deep learning network model, a 2D bounding box of the target is obtained, and a pixel in the center of the lower edge of the bounding box is used as a track point of the target on the picture;

tracking based on a 2D target to obtain track points of other pictures of the target at different moments;

and acquiring track points of the pictures of the plurality of targets at different moments.

In some embodiments, the determining a feature extraction region in the region to be detected based on the trajectory point includes:

when the number of the track points is larger than a preset threshold value, determining left and right edges of a rectangular area based on the maximum and minimum values of the horizontal coordinates of the track points, determining upper and lower edges of the rectangular area based on the maximum and minimum values of the vertical coordinates of the track points, and reserving preset widths between each edge of the rectangular area and each edge of the picture as buffer spaces;

and taking the lower edge of the rectangular area as the lower edge of the triangular area, taking the average value of the abscissa of the track points forming the upper edge of the rectangular area as the vertex of the triangular area, and taking the triangular area as the feature extraction area.

In some embodiments, the detecting vanishing points within the feature extraction region includes:

detecting straight line characteristics in the picture in the rectangular area by adopting a line characteristic detection algorithm;

vanishing points are detected based on the lines within the triangular region.

In some embodiments, the detecting vanishing points based on the straight lines within the triangle area includes:

for each picture, acquiring all intersection points formed by intersecting the straight lines in the triangular region in pairs, calculating the distance square sum of the intersection points to each straight line in the triangular region for each intersection point, and taking the intersection point with the minimum distance square sum as a vanishing point of the picture;

and filtering all vanishing points corresponding to the pictures at different moments by using a linear Kalman filter to obtain stable vanishing points.

In some embodiments, determining the pitch angle of the monocular camera to the area to be detected based on the position of the vanishing point comprises:

wherein alpha is the pitch angle of the lens,is the ordinate of vanishing point +.>Is focal length in vertical direction +.>Representing the optical center ordinate.

A second aspect of the embodiments of the present disclosure provides a monocular camera depth estimation method, where the monocular camera is designed based on a pinhole imaging principle, is deployed on a road side, and a roll angle between the monocular camera and a road surface to-be-detected area is not greater than a preset value, including:

2D target detection is carried out on the picture of the region to be detected, which is acquired by the monocular camera, so that coordinates of the target in a pixel plane are obtained;

determining pitch angles of the monocular camera and the area to be detected based on the method of the first aspect of the disclosure, and acquiring the height of the monocular camera relative to the road surface and the camera internal parameters of the monocular camera;

estimating the depth of the target based on the coordinates of the target in the pixel plane and the pitch angle, the height and the camera internal parameters.

A third aspect of the disclosed embodiments provides a device for determining a pitch angle between a monocular camera and a road surface to be detected, where the monocular camera is designed based on a pinhole imaging principle, is deployed on a road side, and a roll angle between the monocular camera and the road surface to be detected is not greater than a preset value, and includes:

the detection module is used for acquiring pictures of the to-be-detected region at the same visual angle and different moments based on the monocular camera, acquiring track points of the moving object in the to-be-detected region based on the pictures, and determining a feature extraction region in the to-be-detected region based on the track points;

and the determining module is used for detecting vanishing points in the feature extraction area and determining pitch angles of the monocular camera and the area to be detected based on the positions of the vanishing points.

A fourth aspect of the disclosed embodiments provides a monocular camera depth estimation device, the monocular camera is designed based on a pinhole imaging principle, deployed on a road side and has a roll angle with a road surface to-be-detected area not greater than a preset value, including:

the target detection module is used for carrying out 2D target detection on the picture of the region to be detected, which is acquired by the monocular camera, so as to obtain the coordinates of the target on the pixel plane;

a parameter obtaining module, configured to determine pitch angles of the monocular camera and the area to be detected based on the method according to the first aspect of the present disclosure, and obtain a height of the monocular camera relative to the road surface and a camera internal parameter of the monocular camera;

and the depth estimation module is used for estimating the depth of the target based on the coordinate of the target on the pixel plane, the pitch angle, the height and the camera internal parameters.

A fifth aspect of the embodiments of the present disclosure provides a moving object instance segmentation apparatus, characterized in that: comprising a memory and a processor, wherein the memory is configured to store,

the memory is used for storing a computer program;

the processor is configured to implement the method according to the first and second aspects of the present disclosure when executing the computer program.

In summary, the method, the device and the equipment for determining the pitch angle of the monocular camera and estimating the depth of the monocular camera provided by the embodiments of the present disclosure utilize the perspective principle, that is, the principle that two lines parallel to each other in the physical world meet at a vanishing point in a pixel plane, and based on the relationship between the vanishing point in the pixel plane and the pose angle of the camera in the physical world (that is, a road coordinate system), so that the vanishing point can be directly calculated according to the road surface picture obtained by the monocular camera, the pitch angle of the camera is calculated according to the vanishing point, then the target position is obtained through 2D detection, and the target depth is calculated in combination with the camera height information. Because the generalization and the precision of the 2D target detection algorithm are generally higher, and the scheme can be directly integrated to camera driving based on a geometric principle and a mathematical formula, no extra computational power resource is needed, and therefore, the time delay is low and the deployment is easy.

Drawings

The features and advantages of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be construed as limiting the disclosure in any way, in which:

FIG. 1 is a schematic diagram of a computer system to which the present disclosure is applicable;

FIG. 2 is a schematic diagram of a coordinate system of a camera deployed on a roadside traffic pole/tower and a road surface to which the present disclosure is applicable;

FIG. 3 is a schematic diagram of a depth estimation principle based on a pinhole imaging model to which the present disclosure is applicable;

FIG. 4 is a schematic illustration of vanishing points and parallel lines applicable to the present disclosure;

FIG. 5 is a flow chart illustrating a method of determining a single camera to road pitch angle according to some embodiments of the present disclosure;

FIG. 6 is a feature extraction effect diagram shown in accordance with some embodiments of the present disclosure;

FIG. 7 is a graph of the effect after vanishing point filtering according to some embodiments of the present disclosure;

FIG. 8 is a schematic diagram of line detection shown in accordance with some embodiments of the present disclosure;

FIG. 9 is a flow chart of a monocular camera depth estimation method shown in accordance with some embodiments of the present disclosure;

FIG. 10 is a schematic illustration of an apparatus for determining the pitch angle of a monocular camera to an area to be inspected on a road surface, according to some embodiments of the present disclosure;

FIG. 11 is a schematic diagram of a monocular camera depth estimation device shown according to some embodiments of the present disclosure;

fig. 12 is a schematic diagram of a monocular camera depth estimation device shown according to some embodiments of the present disclosure.

Detailed Description

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. It should be appreciated that the use of "system," "apparatus," "unit," and/or "module" terms in this disclosure is one method for distinguishing between different parts, elements, portions, or components at different levels in a sequential arrangement. However, these terms may be replaced with other expressions if the other expressions can achieve the same purpose.

It will be understood that when a device, unit, or module is referred to as being "on," "connected to," or "coupled to" another device, unit, or module, it can be directly on, connected to, or coupled to, or in communication with the other device, unit, or module, or intervening devices, units, or modules may be present unless the context clearly indicates an exception. For example, the term "and/or" as used in this disclosure includes any and all combinations of one or more of the associated listed items.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present disclosure. As used in the specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only those features, integers, steps, operations, elements, and/or components that are explicitly identified, but do not constitute an exclusive list, as other features, integers, steps, operations, elements, and/or components may be included.

These and other features and characteristics of the present disclosure, as well as the methods of operation, functions of the related elements of structure, combinations of parts and economies of manufacture, may be better understood with reference to the following description and the accompanying drawings, all of which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It will be understood that the figures are not drawn to scale.

Various block diagrams are used in the present disclosure to illustrate various modifications of the embodiments according to the present disclosure. It should be understood that the foregoing or following structures are not intended to limit the present disclosure. The protection scope of the present disclosure is subject to the claims.

FIG. 1 is a schematic diagram of a computer system to which the present disclosure is applicable. As in the system shown in fig. 1, the depth calculation server is in data connection with a monocular image acquisition device deployed on a road-side traffic pole/tower, the monocular image acquisition device acquires a picture of an area to be detected, and the depth calculation server calculates depth information of a target object in the picture. Wherein:

the monocular image acquisition device refers to a monocular image acquisition device based on a pinhole imaging model, such as a monocular camera for road monitoring purposes. In some embodiments of the present disclosure, the monocular image capturing device refers to a monocular camera or a monocular camera, a monitoring device, which may capture video. In other embodiments of the present disclosure, the monocular image capturing apparatus refers to a monocular camera (hereinafter referred to as a monocular image capturing apparatus by camera) that can capture a plurality of pictures of an area to be detected at different times.

The depth computing server may be a stand-alone device or may be a network server based on network connection. In particular, the depth computing server may be a software or hardware module deployed on the camera. In the case of a stand-alone device, the depth computing server may be any of a stand-alone, clustered, or distributed server.

FIG. 2 is a schematic diagram of a coordinate system of a camera and a road surface deployed on a road side traffic rod/tower, as shown in FIG. 2, an origin O of a camera coordinate system O-XYZ is located at a camera optical center, a Z axis is parallel to a camera optical axis and points to the outside of the camera, the direction of an X axis is horizontal to the right, and a Y axis and other axes form a right-hand coordinate system; the origin O ' of the road surface coordinate system O ' -X ' Y ' Z ' is positioned at the buried place of the road rod, the Z ' axis is the projection of the Z axis of the camera coordinate system on the local ground, the X ' axis is positioned in the road surface and is perpendicular to the Z ' axis, and the Y ' axis, the X ' and the Z ' form a right hand system.

The pose angle affecting the depth estimation of the camera on the target has two degrees of freedom in total: the rotation angle alpha of the z axis of the camera coordinate system relative to the z' axis of the road surface coordinate system is a pitch angle (pitch), and the positive direction is regulated by a right hand system; the rotation angle gamma of the camera coordinate system x axis relative to the road surface coordinate system x' axis is roll angle (roll). The image metaphors, the pitch angle corresponds to the degree of head lowering of the camera to the road surface, and the roll angle corresponds to the degree of head deflection of the camera to the road surface. The present disclosure is applicable to scenes where the roll angle is below a preset value, which may be approximately zero. That is, the present disclosure applies to a scene where the camera is low but not offset.

A schematic diagram of depth estimation based on a pinhole imaging model is shown in fig. 3, where (x, y, z) is the coordinates of the P point in the three-dimensional coordinate system of the camera, and (u, v) is the coordinates of the P point in the imaging point P' of the pixel plane.

The coordinates (u, v) of the target image point on the pixel plane are related to their coordinates (x, y, z) in the camera coordinate system as follows:

wherein K is called camera reference, and is represented by (f _u ,f _v ,c _u ,c _v ) The four parameter decisions, generally known, do not involve the estimation of the internal reference K. z represents the target depth.

If the two-dimensional information of the ground plane is to be recovered, a conversion matrix from the ground plane to the camera plane is also needed. Defining a ground plane as an XOZ plane of a road coordinate system, wherein the conversion relation from coordinate points in the road coordinate system to coordinate points in an image coordinate system is as follows:

the transformation matrix from the camera coordinate system to the road surface coordinate system can be disassembled into:

wherein the method comprises the steps ofA translation vector representing the camera coordinate system to the road surface coordinate system, wherein +.>The distance from the origin of the camera coordinate system to the ground plane is the camera mounting height. R represents the rotation matrix of the camera coordinate system to the road surface coordinate system, expressed mathematically as follows, where α, γ are the pitch angle (pitch) and roll angle (roll) of the camera, respectively:

r is a unit orthogonal array, which hasBased on the pinhole camera imaging principle, when the roll angle is about zero, the above conversion relationship can be obtained: />

Wherein,the distance from the origin of the camera coordinate system to the ground plane is the camera mounting height, α is the camera pitch angle, and v is the longitudinal component of the coordinates of the center of the lower edge of the target (which point is considered to be the point of contact of the target with the ground). Z is the target depth. />And->Is a camera reference, wherein->Is the ordinate of the projection point of the optical center on the pixel plane,>is the focal length in the longitudinal direction.

Therefore, the target depth can be easily calculated by only acquiring the camera pitch angle. The present disclosure uses vanishing points to determine camera pitch angle.

According to the principle of geometric perspective, parallel lines in the 3D world (namely a camera coordinate system) are intersected after projection transformation, the intersection point is a vanishing point, the parallel lines in the 3D world are mutually parallel, and the vanishing points on the 2D plane are the same point. The vanishing points of all parallel lines are on the same straight line, which is the horizon line, as shown in fig. 4. Therefore, vanishing points can be obtained through detecting the parallel line characteristics in the ground characteristic area, such as detecting the lane lines.

The conversion relationship of the points in the three-dimensional pavement coordinate system O '-X' Y 'Z' to the corresponding image points in the two-dimensional image coordinate system C-UV through the three-dimensional camera coordinate system O-XYZ is as follows:

considering the straight line x=kz+b on the ground plane, the corresponding point (u, v) after projection onto the image satisfies:

order theThe coordinates of the corresponding image point at infinity are only related to the slope of the line:

performing mathematical transformation to obtainThe method can obtain:

eliminating k, the linear equation of the horizon in the image coordinates can be obtained:

when the roll angle of the camera is considered negligible, i.e., γ=0, the above equation can be reduced to:

therefore, only the longitudinal coordinates of vanishing points in the pixel plane need to be obtainedAnd +.>(ordinate of optical center projected point on pixel plane),>the pitch angle α of the camera can be calculated (focal length in the longitudinal direction):

fig. 5 is a flow chart of a method of determining a monocular camera and road pitch angle, the monocular camera designed based on pinhole imaging principles, deployed on a road-side traffic pole or tower and having a yaw angle with the road surface less than a preset value, according to some embodiments of the present disclosure. In some embodiments, the method of determining the monocular camera and road pitch angle is performed by a depth calculation server in the system shown in fig. 1, the method of determining the monocular camera and road pitch angle comprising the steps of:

s510, acquiring pictures of the road surface to-be-detected area at the same visual angle and at different moments based on the monocular camera, acquiring track points of the moving object in the to-be-detected area based on the pictures, and determining a feature extraction area in the to-be-detected area based on the track points.

S520, detecting vanishing points in the feature extraction area, and determining pitch angles of the monocular camera and the area to be detected based on the positions of the vanishing points.

Specifically, first, feature region extraction is performed:

because the track points of the moving object are intensively distributed in a certain area in the picture instead of the whole area (for example, the vehicle cannot run to the tree tip), for the time sequence image data and the 2D detection result under a given visual angle, the feature area extraction module is responsible for outputting a rectangular area for the subsequent line feature detection, and a triangular area is used for further screening features, so that feature noise is reduced to the greatest extent, and the influence of roadside objects is filtered. The processing flow of the module is as follows:

2D target detection: firstly, 2D target detection and 2D target tracking are carried out on an input picture, a 2D boundary frame of a target is obtained (or a 2D sensing result is directly obtained from a sensor), pixel coordinates of a center point of the lower edge of the boundary frame are easy to obtain and are used as target points, and track points of a plurality of moving targets on an image are continuously accumulated; (the point in FIG. 6 is the historical trace detected in the image)

When the accumulated track points reach a certain number, the module performs one-time statistical analysis (considering the maximum coordinates, the minimum coordinates, the preset picture boundary protection threshold value and the like of the transverse track points and the longitudinal track points) on the track points to obtain a rectangular area (shown as a rectangular frame in fig. 6) of the interested road surface, and simultaneously outputs a triangular area (shown as the triangular frame in fig. 6).

Specifically, the left edge and the right edge of the rectangular area are determined based on the maximum value and the minimum value of the abscissa of the track point, the upper edge and the lower edge of the rectangular area are determined based on the maximum value and the minimum value of the ordinate of the track point, and a preset width is reserved between each edge of the rectangular area and each edge of the picture to serve as a buffer space;

and taking the lower edge of the rectangular area as the lower edge of the triangular area, and taking the average value of the abscissa of the track points forming the upper edge of the rectangular area as the vertex of the triangular area.

The buffer space is provided to avoid noise characteristics that may exist at the edge of the picture, because for the monitoring camera at the road side, it is usually opposite to the road surface, and the closer to the edge of the picture, the greater the possibility of building or tree is. The buffer corresponds to a somewhat more "contracted" rectangular box, reducing the likelihood of incorporating edge objects as well.

FIG. 6 is a feature extraction effect diagram, wherein A is a camera artwork; and B is the line characteristic detection result of the no-characteristic region. The white line is the detected lane line characteristic, and the non-lane line characteristic marked by the curve is characteristic noise, in particular noise on a left bus and a right road in the figure; c is a feature region extraction result, wherein each point is a history track point of each target detected in an accumulated way, and a rectangular frame and a triangular frame are obtained feature extraction regions; d is the result of line feature detection in the feature region, wherein a straight line is the detected lane line feature, and the visible feature noise is greatly reduced.

The rectangular frame in the present disclosure plays a role in cropping a picture, and then line feature detection is performed only in the rectangular frame area, in other words, for line feature detection, the algorithm ignores the part of the picture located outside the rectangular frame. And then the detected line features are screened by using the triangular region in one step, and only the line features in the triangular region are reserved so as to ensure noise reduction. The line feature detection cannot be directly performed in the triangle area, because the current line feature detector can only input rectangular frame pictures, and therefore a two-stage method is adopted: the original picture is cut by a rectangular frame, and then the line characteristics are screened by a triangular frame.

And then detecting line characteristics and vanishing points, which comprises the following discrete association steps:

a) Converting the area in the rectangular frame into a gray level image and carrying out Gaussian blur processing on the gray level image: the color image is first cropped according to the feature area and then converted into a single-channel gray scale image, with each pixel value of the image between 0 and 255. Then, a Gaussian blur function is used for carrying out blur processing on the gray level image, so that noise and details in the image are reduced, and the image is smoother;

b) And (3) gray level map binarization processing: thresholding the blurred gray image by using a threshold function to generate a binary image, wherein the binary image only comprises two pixel values, namely 0 and 255, and is used for separating a target object and a background;

c) Canny edge detection: identifying an edge structure in the binary image by using a Canny edge detection operator, and keeping the definition of the edge while reducing noise;

d) Hough transformation straight line detection: detecting straight lines in an image by using probability Hough transformation, thresholding the Hough transformation result to determine which represent real straight lines, and finally extracting parameters representing the detected straight lines, wherein each line segment is represented and stored by coordinates of two endpoints of the line segment;

e) Vanishing point determination: and obtaining the common intersection point of a plurality of straight lines in the triangle area to obtain the vanishing point of the image. The algorithm circularly traverses each pair of straight lines to obtain intersection points, then calculates the distance between each intersection point and all other straight lines, and obtains the intersection point with the smallest sum of squares of the distances of all straight lines as the vanishing point of the image.

f) Vanishing point filtering: and filtering the obtained vanishing point result by using a linear Kalman filter, and filtering noise in the observed data. First selecting a proper initial state, selecting empiricallyThe convergence speed can be improved on the premise of ensuring stability. And secondly, modifying and determining a process noise covariance matrix and an observation noise covariance matrix according to the test effect, and ensuring the smoothness and tracking capability of the Kalman filter on state estimation. The coordinates of vanishing points before and after filtering are shown as purple and red points respectively, and the vanishing points after filtering gradually tend to converge.

The filtering step is also required in the present disclosure because the present disclosure re-performs the line detection for each frame of image and calculates vanishing points, and the line detection result of the current frame of image may be slightly different from the previous frame due to the normal fluctuation of the pixel value, and thus calculates slightly different vanishing points. The filtering is to avoid fluctuation of the distance depth calculation result due to fluctuation of vanishing point positions. The effect of the vanishing point filtering is shown in figure 7.

Fig. 8 shows straight line detection based on Canny operator and Hough transform, wherein A, B, C, D is respectively a gray level diagram, a binary diagram, canny line characteristics and straight lines detected by Hough transform.

Finally, the following formula is used for calculating the pitch angle of the camera.

Wherein:

is the longitudinal coordinate of vanishing point in the pixel plane, < >>、/>Is a camera reference, wherein->Is the ordinate of the projection point of the optical center on the pixel plane,/->Is the focal length in the longitudinal direction.

Fig. 9 is a flow chart of a monocular camera depth estimation method according to some embodiments of the present disclosure, the monocular camera being designed based on pinhole imaging principles, deployed on a road-side traffic pole or tower and having a deflection angle with the road surface less than a preset value. In some embodiments, a monocular camera depth estimation method is performed by a depth calculation server in the system shown in fig. 1, the monocular camera depth estimation method comprising the steps of:

s910: and 2D target detection is carried out on the picture of the region to be detected, which is acquired by the monocular camera, so that the coordinates of the target on the pixel plane are obtained.

S920: and determining the pitch angle of the monocular camera and the area to be detected based on the method in S510-S520 in fig. 5, and acquiring the height of the monocular camera relative to the road surface and the camera internal parameters of the monocular camera.

S930: estimating the depth of the target based on the coordinates of the target in the pixel plane and the pitch angle, the height and the camera internal parameters.

Specifically, a deep learning network model or a traditional target detection algorithm, such as a YOLO network model or a Gaussian Mixture Model (GMM) based algorithm, is used to detect a 2D bounding box of a target on a picture of a region to be detected, which is acquired by a monocular camera, and coordinates of the center of the lower edge of the bounding box of the target are used as coordinates of the target on a pixel plane.

And calculating the pitch angle between the monocular camera and the area to be detected based on the method of s510-s520 in fig. 5, and acquiring the height of the monocular camera relative to the road surface and the camera internal parameters of the monocular camera.

Finally, the depth of the target is determined based on the following formula:

wherein,the distance from the origin of the camera coordinate system to the ground plane is the camera mounting height, α is the camera pitch angle, and v is the longitudinal component of the coordinates of the center of the lower edge of the target (which point is considered to be the point of contact of the target with the ground). Z is the target depth>And->Is a camera reference, wherein->Is the ordinate of the projection point of the optical center on the pixel plane,>is the focal length in the longitudinal direction.

Fig. 10 is a schematic diagram of an apparatus for determining the pitch angle of a monocular camera to an area to be inspected on a road surface, according to some embodiments of the present disclosure. As shown in fig. 10, the apparatus 1000 for determining a pitch angle between a monocular camera and a road surface to be detected includes a feature detection module 1010 and a determination module 1020. The function of determining the pitch angle between the monocular camera and the area to be detected of the road surface can be performed by a depth calculation server in the system shown in fig. 1. Wherein:

the detection module 1010 is configured to obtain pictures of the to-be-detected region at the same viewing angle and different moments based on the monocular camera, obtain a track point of a moving object in the to-be-detected region based on the pictures, and determine a feature extraction region in the to-be-detected region based on the track point;

and a determining module 1020, configured to detect vanishing points in the feature extraction area, and determine pitch angles of the monocular camera and the area to be detected based on positions of the vanishing points.

Fig. 11 is a schematic diagram of a monocular camera depth estimation device, shown according to some embodiments of the present disclosure. As shown in fig. 11, the monocular camera depth estimation apparatus 1100 includes a target detection module 1110, a parameter acquisition module 1120, and a depth estimation module 1130. The monocular camera depth estimation function may be performed by a depth calculation server in the system shown in fig. 1. Wherein:

the target detection module 1110 is configured to perform 2D target detection on the image of the region to be detected, which is acquired by the monocular camera, to obtain coordinates of a target in a pixel plane;

a parameter obtaining module 1120, configured to determine a pitch angle between the monocular camera and the area to be detected based on the method described in S510-S520 in fig. 5, and obtain a height of the monocular camera relative to the road surface and a camera internal parameter of the monocular camera;

a depth estimation module 1130 for estimating a depth of the target based on coordinates of the target in a pixel plane and the pitch angle, the height, and the camera intrinsic parameters.

One embodiment of the present disclosure provides a monocular camera depth estimation apparatus. As shown in fig. 12, the monocular camera depth estimation apparatus 1200 includes a memory 1220 and a processor 1210, the memory 1220 storing a computer program; the processor 1210 is configured to implement the methods described in S510-S520 in fig. 5 or S910-S930 in fig. 9 when executing the computer program.

In summary, the method, the device and the equipment for determining the pitch angle of the monocular camera and estimating the depth of the monocular camera provided by the embodiments of the present disclosure utilize the perspective principle, that is, the principle that two lines parallel to each other in the physical world meet at a vanishing point in a pixel plane, and are based on the relationship between the vanishing point in the pixel plane and the pose angle of the camera in the physical world (that is, the road coordinate system), so that the vanishing point can be directly calculated according to the road surface picture obtained by the monocular camera, and the pitch angle of the camera can be calculated according to the vanishing point. And then acquiring a target position through 2D detection, and calculating the target depth by combining the camera height information. Because the generalization and the precision of the 2D target detection algorithm are generally higher, and the scheme can be directly integrated to camera driving based on a geometric principle and a mathematical formula, no extra computational power resource is needed, and therefore, the time delay is low and the deployment is easy.

It will be clear to those skilled in the art that, for convenience and brevity of description, the specific operation of the apparatus and modules described above may refer to the corresponding description in the foregoing apparatus embodiments, which is not repeated here.

While the subject matter described herein is provided in the general context of operating systems and application programs that execute in conjunction with the execution of a computer system, those skilled in the art will recognize that other implementations may also be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like, as well as distributed computing environments that have tasks performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

It is to be understood that the above-described embodiments of the present disclosure are merely illustrative or explanatory of the principles of the disclosure and are not restrictive of the disclosure. Accordingly, any modifications, equivalent substitutions, improvements, or the like, which do not depart from the spirit and scope of the present disclosure, are intended to be included within the scope of the present disclosure. Furthermore, the appended claims of this disclosure are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or the equivalents of such scope and boundary.

Claims

1. The utility model provides a method of determining monocular camera and road surface wait to detect regional pitch angle, monocular camera is based on pinhole imaging principle design, disposes in the roadside and monocular camera with wait to detect regional roll angle and be greater than the default, characterized in that includes:

detecting vanishing points in the feature extraction area, and determining pitch angles of the monocular camera and the area to be detected based on the positions of the vanishing points; wherein,

the determining a feature extraction region in the region to be detected based on the track point comprises:

2. The method according to claim 1, wherein the acquiring the trajectory point of the moving object in the region to be detected based on the picture includes:

3. The method of claim 1, wherein the detecting vanishing points within the feature extraction area comprises:

vanishing points are detected based on the lines within the triangular region.

4. A method according to claim 3, wherein said detecting vanishing points based on straight lines within said triangular region comprises:

5. The method of claim 1, wherein determining a pitch angle of the monocular camera with the area to be detected based on the position of the vanishing point comprises:

wherein the method comprises the steps ofFor pitch angle, < >>Is the ordinate of vanishing point +.>Is focal length in vertical direction +.>Representing the optical center ordinate.

6. The utility model provides a monocular camera degree of depth estimation method, monocular camera designs based on pinhole imaging principle, disposes in the roadside and monocular camera and road surface wait to detect the roll angle in regional and be greater than the default, characterized in that includes:

determining a pitch angle of the monocular camera and the area to be detected based on the method of any one of claims 1-5, and acquiring a height of the monocular camera relative to the road surface and a camera internal reference of the monocular camera;

7. The utility model provides a confirm monocular camera and road surface and wait to detect device of regional pitch angle, monocular camera designs based on pinhole imaging principle, disposes in the roadside and monocular camera with wait to detect the roll angle in regional is not greater than the default, its characterized in that includes:

the determining module is used for detecting vanishing points in the feature extraction area and determining pitch angles of the monocular camera and the area to be detected based on the positions of the vanishing points; wherein,

8. A monocular camera depth estimation device, the monocular camera is designed based on a pinhole imaging principle, deployed on a road side and has a roll angle with a road surface to-be-detected area not greater than a preset value, characterized by comprising:

a parameter acquisition module for determining the pitch angle of the monocular camera and the area to be detected based on the method of any one of claims 1-5, and acquiring the height of the monocular camera relative to the road surface and the camera internal parameters of the monocular camera;

9. A monocular camera depth estimation device comprising a memory and a processor:

the memory is used for storing a computer program;

the processor being adapted to implement the method according to any one of claims 1-5 or claim 6 when executing the computer program.