CN116295237A

CN116295237A - Monocular camera ranging method and device, storage medium and electronic equipment

Info

Publication number: CN116295237A
Application number: CN202310113266.6A
Authority: CN
Inventors: 梅骁; 刘明春; 张智清; 李春; 谭福伦; 聂石启; 周洋
Original assignee: King Long United Automotive Industry Suzhou Co Ltd
Current assignee: King Long United Automotive Industry Suzhou Co Ltd
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-06-23

Abstract

The application discloses a monocular camera ranging method and device, a storage medium and electronic equipment. The method comprises the following steps: acquiring a target image by using a monocular camera, and inputting the target image into a preset target detection model; in the target detection model, determining a target object in a target image and a target frame corresponding to the target object; determining the image height of the target object in the target image according to the two-dimensional coordinates of the target frame; and determining the target distance between the target object and the monocular camera according to the focal length of the monocular camera, the actual height of the target object and the image height. The method solves the problems that the conventional monocular camera ranging method is complex and the hardware requirement is high.

Description

Monocular camera ranging method and device, storage medium and electronic equipment

Technical Field

The application relates to the field of automatic driving, in particular to a monocular camera ranging method and device, a storage medium and electronic equipment.

Background

Sensing technology plays a vital role in an automatic driving system, and based on various sensors installed on an unmanned vehicle, detection of the surrounding environment can identify possible obstacles and the like around the unmanned vehicle, and the sensing technology serves as an 'eye' of the unmanned vehicle. Common sensing sensors include lidar, millimeter wave radar, cameras, and the like. The data characteristic information collected by the camera is rich and low in cost, however, the existing method for measuring the distance by using the information collected by the camera is complex, and the hardware requirement is high. For example, CN202210810678.0 discloses a monocular ranging system and method for an underground unmanned vehicle, which predicts based on a kalman filtering algorithm, relies on a multi-target fusion algorithm, has higher algorithm complexity, and has higher requirements on parameter adjustment capability and hardware performance when implemented; CN202210456355.6 discloses a monocular real-time ranging method based on deep learning target detection, which utilizes ground reference points to automatically estimate the mapping relation between the three-dimensional coordinates of the target and the two-dimensional coordinates of the corresponding camera image. The method needs to accurately identify the actual coordinates of reference points such as ground lane lines, corner points or outlines of traffic marks and the like, and has high requirements on the accuracy of an identification algorithm; CN202210897606.4 proposes a monocular ranging method and system based on a self-adaptive target detection network and license plate detection, the ranging accuracy depends on the contour accuracy, but the difference of the identification effect of the contour is larger under different postures and illumination conditions of the target, so that the robustness of monocular ranging by using the method is poor, and the accuracy cannot be ensured.

Disclosure of Invention

In view of the above, the application provides a monocular camera ranging method and device, a storage medium and electronic equipment, which solve the problems that the conventional monocular camera ranging method is complex and the hardware requirement is high on the premise of ensuring accuracy.

According to one aspect of the present application, there is provided a monocular camera ranging method, including:

acquiring a target image by using a monocular camera, and inputting the target image into a preset target detection model;

in the target detection model, determining a target object in the target image and a target frame corresponding to the target object;

determining the image height of the target object in the target image according to the two-dimensional coordinates of the target frame;

and determining a target distance between the target object and the monocular camera according to the focal length of the monocular camera, the actual height of the target object and the image height.

Optionally, the determining the target object in the target image and the target frame corresponding to the target object includes:

extracting image features in the target image, and generating a feature map according to the image features;

dividing the feature map into a preset number of sub-areas, and detecting each sub-area by utilizing a predefined detection frame to obtain at least one alternative frame and a confidence level corresponding to the alternative frame, wherein the alternative frame contains an alternative object;

and determining a target frame in the candidate frames according to a preset threshold value of the target detection model and the confidence corresponding to the candidate frames, and determining candidate objects contained in the target frame as the target objects.

Optionally, the preset threshold includes a confidence threshold and an intersection ratio threshold;

correspondingly, the determining the target frame in the candidate frame according to the preset threshold value of the target detection model and the confidence corresponding to the candidate frame comprises the following steps:

removing the candidate frames with the confidence coefficient smaller than the confidence coefficient threshold value from the at least one candidate frame to obtain a candidate set;

determining an alternative frame with the highest confidence coefficient in the alternative set as a base frame, removing the base frame from the alternative set, and adding the base frame into the base frame set;

determining the cross ratio between each alternative frame in the candidate set and the base frame, eliminating the alternative frames with the cross ratio larger than the cross ratio threshold, and returning to the step of determining the alternative frame with the largest confidence in the candidate set as the base frame until the candidate set is an empty set;

and determining an alternative frame in the base frame set as the target frame.

Optionally, the determining the image height of the target object in the target image according to the two-dimensional coordinates of the target frame includes:

and determining the pixel number of the target frame in the y-axis direction according to the two-dimensional coordinates of the target frame, and determining the image height according to the pixel number.

Optionally, the method further comprises:

presetting a plurality of preset categories, and respectively collecting object samples corresponding to each preset category;

determining the average value of the heights of the object samples corresponding to each preset category as the category height corresponding to the preset category;

accordingly, before the determining the target distance between the target object and the monocular camera according to the focal length of the monocular camera, the actual height of the target object, and the image height, the method further includes:

and determining the target class of the target object in the target detection model, and determining the class height corresponding to the target class as the actual height of the target object.

Optionally, before the inputting the target image into a preset target detection model, the method includes:

and training the target detection model by using a pre-labeled training image, and adjusting a preset threshold value of the target detection model.

Optionally, before the training the target detection model using the pre-labeled training image, the method further includes:

acquiring a plurality of original training images, wherein the original training images contain objects to be marked, and the objects to be marked corresponding to each preset category are larger than a first preset quantity threshold;

and respectively determining the number of pixels occupied by each object to be marked, and marking the object to be marked if the number of pixels is larger than a second preset number threshold value, so as to obtain the pre-marked training image.

According to another aspect of the present application, there is provided a monocular camera ranging apparatus, the apparatus comprising:

the shooting module is used for acquiring a target image by using a monocular camera and inputting the target image into a preset target detection model;

the detection module is used for determining a target object in the target image and a target frame corresponding to the target object in the target detection model;

the operation module is used for determining the image height of the target object in the target image according to the two-dimensional coordinates of the target frame; and determining a target distance between the target object and the monocular camera according to the focal length of the monocular camera, the actual height of the target object and the image height.

Optionally, the detection module is configured to:

accordingly, the detection module is used for:

and determining an alternative frame in the base frame set as the target frame.

Optionally, the operation module is configured to:

Optionally, the apparatus further comprises an initialization module for:

correspondingly, the detection module is further configured to: determining a target class of the target object in the target detection model;

the operation module is also used for: and determining the class height corresponding to the target class as the actual height of the target object.

Optionally, the initialization module is configured to:

Optionally, the initialization module is further configured to:

According to still another aspect of the present application, there is provided a storage medium having stored thereon a program or instructions which, when executed by a processor, implement the above-described monocular camera ranging method.

According to still another aspect of the present application, there is provided an electronic device including a storage medium storing a computer program and a processor implementing the above monocular camera ranging method when the processor executes the computer program.

By means of the technical scheme, the target object in the image is detected by using the target detection algorithm, and the method has the characteristics of low hardware requirements, high target detection precision and the like; in addition, the ranging algorithm adopts a simpler geometric model method, and compared with the ranging algorithm based on machine learning, the ranging algorithm is simpler and has better calculation instantaneity. In addition, the target frame corresponding to the target object can be displayed on the display screen in real time, the target distance is displayed, and a user can intuitively know the distance between each object in the surrounding environment and the camera through the display screen in real time.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 shows a schematic flow chart of a monocular camera ranging method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of another monocular camera ranging method according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of another monocular camera ranging method according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of another monocular camera ranging method according to an embodiment of the present disclosure;

fig. 5 is a schematic view of a target object grounding point coordinate of another monocular camera ranging method according to an embodiment of the present disclosure;

fig. 6 shows a schematic diagram of a monocular ranging application of another monocular camera ranging method provided in an embodiment of the present application;

fig. 7 shows a block diagram of a monocular camera ranging apparatus according to an embodiment of the present application.

Detailed Description

The present application will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.

In this embodiment, a monocular camera ranging method is provided, as shown in fig. 1, and the method includes the following steps:

step 101, acquiring a target image by using a monocular camera, and inputting the target image into a preset target detection model;

102, in a target detection model, determining a target object in a target image and a target frame corresponding to the target object;

step 103, determining the image height of the target object in the target image according to the two-dimensional coordinates of the target frame;

step 104, determining the target distance between the target object and the monocular camera according to the focal length of the monocular camera, the actual height of the target object and the image height.

According to the monocular camera ranging method, the distance between the target object in the image and the camera is judged according to the image shot by the camera. Specifically, the embodiment of the application mainly comprises two steps of target detection and monocular ranging. The target detection step is shown in steps 101-102, firstly, a target image is obtained by using a monocular camera, then a preset target detection model is input, and target detection is carried out on the target image by using the target detection model, so that a target object in the target image is obtained.

Wherein the object detection model may be a yolo-v5s algorithm model. It will be appreciated that yolo is an open-source object detection model, a neural network that can predict classes and bounding boxes of objects. yolo-v5s is a version of yolo-v5 and is a network of minimum depth and minimum width of the feature map in the yolo-v5 series.

Step 103-104, after obtaining the target object, determining the height of the target object in the image and the real height of the target object in the actual scene, and calculating the target distance between the target object and the camera based on a geometric method by combining the focal length of the camera.

Specifically, after the target object is determined by using the target detection model and a target frame corresponding to the target object is output, the grounding point of the target object can be determined by using a geometric method. For example, the center of the lowest position frame line of the ordinate of the target frame may be regarded as the grounding point of the target object according to the two-dimensional coordinates of the target frame. The grounding point is used as the measuring position of the target object, and then in the subsequent calculation process, the distance between the grounding point of the target object and the optical center of the camera can be estimated and used as the target distance between the target object and the camera.

Wherein, the target distance can be calculated using the following formula:

wherein d represents the distance between the target object and the camera, f represents the focal length of the camera, H is the actual height of the target object, and H is the image height of the target object in the target image.

The embodiment uses a target detection algorithm to detect a target object in an image, and has the characteristics of low hardware requirements, high target detection precision and the like; in addition, the ranging algorithm adopts a simpler geometric model method, and compared with the ranging algorithm based on machine learning, the ranging algorithm is simpler and has better calculation instantaneity. In addition, the target frame corresponding to the target object can be displayed on the display screen in real time, the target distance is displayed, and a user can intuitively know the distance between each object in the surrounding environment and the camera through the display screen in real time.

Further, in step 103, determining the image height of the target object in the target image according to the two-dimensional coordinates of the target frame includes the following steps:

step 1031, determining the number of pixel points of the target frame in the y-axis direction according to the two-dimensional coordinates of the target frame, and determining the image height according to the number of pixel points.

In step 1031, it can be understood that, in the same image, the more pixels are occupied in the vertical direction, the higher the height thereof is. Based on this, this embodiment determines the image height from the number of pixels of the target frame in the y-axis direction, and can directly use the number of pixels as the image height of the target object.

Further, as a refinement and extension of the foregoing embodiment, in order to fully describe the implementation procedure of this embodiment, another monocular camera ranging method is provided, as shown in fig. 2, where the method includes the following steps:

step 201, acquiring a target image by using a monocular camera, and inputting the target image into a preset target detection model;

step 202, extracting image features in a target image in a target detection model, and generating a feature map according to the image features;

step 203, in the target detection model, dividing the feature map into a preset number of sub-areas, and detecting each sub-area by utilizing a predefined detection frame to obtain at least one alternative frame and a confidence level corresponding to the alternative frame, wherein the alternative frame contains an alternative object;

in step 204, in the target detection model, according to the preset threshold value of the target detection model and the confidence level corresponding to the candidate frame, determining the target frame in the candidate frame, and determining the candidate object contained in the target frame as the target object.

In steps 202-204, the target image is input into a feature extraction module of the target detection model, and image features are extracted in the module to generate a feature map, wherein the feature map can express deep semantic features of the image. Then dividing the feature map into a plurality of subareas, and assuming that a target object possibly exists in each subarea, detecting by utilizing a predefined detection frame, and outputting the target object possibly contained and the corresponding detection frame, namely an alternative frame and an alternative object. And meanwhile, the confidence coefficient corresponding to each alternative frame can be output, and the confidence coefficient is used for indicating the possibility of the target object in the alternative frame. And finally, eliminating the alternative frames with low possibility according to the preset threshold value of the target detection model and the confidence coefficient of each alternative frame to obtain a target frame, and taking the alternative objects contained in the target frame as target objects.

This embodiment uses the object detection model to obtain the object in the image, and the object frame containing the object. In the target detection model, the target frame is a boundary frame of the target object, so that in a subsequent step, the height of the target object in the image can be approximately determined according to the two-dimensional coordinates of the target frame, and further, the distance is estimated based on the height.

Preferably, in the embodiment of the present application, the preset threshold includes a confidence threshold and an intersection ratio threshold.

It will be appreciated that the confidence level may be used to indicate the likelihood of an event occurring, in yolo, for indicating the likelihood of an object in the current box. Thus, the smaller the confidence threshold, the greater the number of detected target objects but the lower the accuracy; the greater the confidence threshold, the fewer the number of detected target objects but the higher the accuracy. The confidence threshold may be set according to actual application requirements, historical experience, test results, etc. to achieve a balance between the number of target objects and accuracy, and in the actual application process, the confidence threshold may be set to a value between 0.6 and 0.85.

The intersection ratio (Intersection over Union, ioU) threshold is the ratio of the intersection and union of two detection target frames, and in this embodiment, the overlapping ratio of the two candidate frames can be determined based on the intersection ratio of the two candidate frames. If the two alternative boxes are greater than the cross-ratio threshold, then the two alternative boxes are considered to identify the same object. Therefore, the greater the overlap ratio threshold, the less likely it is to be zeroed, and the more target objects are detected, but the greater the likelihood that the detected target objects will repeat; the smaller the overlap ratio threshold, the fewer detected target objects and the less likely repetition. The cross ratio threshold can be set according to actual application requirements, historical experience, test results and the like to achieve the balance between the number of target objects and the repetition rate, and in the actual application process, the cross ratio threshold can be set to be a numerical value between 0.4 and 0.6.

Accordingly, as shown in fig. 3, in step 204, according to a preset threshold value of the target detection model and a confidence level corresponding to the candidate frame, determining the target frame in the candidate frame includes the following steps:

step 2041: removing the candidate frames with the confidence coefficient smaller than the confidence coefficient threshold value from at least one candidate frame to obtain a candidate set;

step 2042: determining an alternative frame with the highest confidence coefficient in the to-be-selected set as a base frame, removing the base frame from the to-be-selected set, and adding the base frame into the base frame set;

step 2043: determining the intersection ratio between each alternative frame and the base frame in the candidate set, eliminating the alternative frames with the intersection ratio larger than the intersection ratio threshold value, and returning to the step of determining the alternative frame with the maximum confidence degree in the candidate set as the base frame until the candidate set is an empty set;

step 2044: and determining an alternative frame in the base frame set as a target frame.

In steps 2041-2044, a target box is determined from the candidate boxes according to the confidence threshold and the overlap ratio threshold. The candidate boxes of the unlikely target object are first removed according to the confidence threshold. Specifically, the magnitude relation between the confidence coefficient of each candidate frame and the confidence coefficient threshold value is judged, if the confidence coefficient is smaller than the confidence coefficient threshold value, the possibility that a target object exists in the candidate frame is considered to be low, and therefore the candidate frame is rejected. The rest candidate frames form a candidate set, and the possibility that target objects exist in the candidate frames in the candidate set is high.

For example, there are five alternative boxes: the confidence of the alternative frame A is 0.8, the confidence of the alternative frame B is 0.7, the confidence of the alternative frame C is 0.7, the confidence of the alternative frame D is 0.6, and the confidence of the alternative frame E is 0.3. And if the confidence coefficient threshold value is 0.6, eliminating the alternative frames E, and forming a set of alternative frames by the rest alternative frames ABCD.

And then eliminating repeated alternative frames of the target object according to the cross ratio threshold. Specifically, taking the candidate frame with the highest confidence in the candidate set as a base frame, removing the base frame from the candidate set, respectively judging the intersection ratio between each remaining candidate frame in the candidate set and the base frame, and if the intersection ratio is larger than the intersection ratio threshold value, considering that the candidate frame and the target object identified by the base frame are the same, so as to avoid the repetition and inhibit the candidate frame. And eliminating the suppressed candidate frames from the candidate set, wherein the rest candidate frames in the candidate set can be regarded as detection frames which are not repeated with the basic frame recognition target object, so that the candidate frames can be utilized for next round of selection. After multiple times of circulation, until the candidate set is an empty set, all the candidate frames in the candidate set are judged to be completed, and the candidate frames in the basic frame set at the moment are taken as target frames, wherein the number of the target frames can be one or a plurality of target frames.

For example, the cross ratio threshold is set to 0.5, and the candidate frame with the highest confidence is selected from the candidate frame set formed by the candidate frames ABCD, namely the candidate frame a is selected as the base frame to be added into the base frame set; and eliminating the alternative frame A from the alternative set, and leaving the alternative frame BCD. And respectively judging the intersection ratio between the alternative frames BCD and the base frame A, and if the intersection ratio between the alternative frames B and A is 0.8, the intersection ratio between the alternative frames C and A is 0.3, and the intersection ratio between the alternative frames D and A is 0.2, inhibiting the alternative frame B. And eliminating the alternative frame B from the alternative set, and selecting the next round by remaining alternative frames CD in the alternative set. In the selection of a new round, selecting an alternative frame with the highest confidence level from the alternative set, namely an alternative frame C, as a base frame, and adding the base frame set; and eliminating the alternative frame C from the alternative set, and leaving the alternative frame D. If the intersection ratio between the alternative frame D and the base frame C is greater than 0.5, the alternative frame D is restrained. And eliminating the alternative frame D from the to-be-selected set, wherein the to-be-selected set is an empty set at the moment, and ending the circulation. At this time, two alternative frames A and C exist in the base frame set, and then the two alternative frames are target alternative frames.

Preferably, in an embodiment of the present application, the method further includes the following steps:

presetting a plurality of preset categories, and respectively collecting object samples corresponding to each preset category; and determining the average value of the heights of the object samples corresponding to each preset category as the category height corresponding to the preset category.

Accordingly, before determining the target distance between the target object and the monocular camera based on the focal length of the monocular camera, the actual height of the target object, and the image height, the method further comprises:

In particular, since the actual height of the target object is inconvenient to measure, the actual height may be set for the target object according to its category. Specifically, a plurality of target object categories to be measured are preset, a plurality of groups of samples are collected for each category, the average value of the heights of the samples is taken, and the average value is taken as the actual height of the target object of the category. Accordingly, in the target detection step, the target detection model may classify the detected target object, and further determine the actual height according to the classification of the target object.

Preferably, before inputting the target image into the preset target detection model, the method comprises:

training a target detection model by using a pre-labeled training image, and adjusting a preset threshold of the target detection model.

In this embodiment, specifically, a training image is used to run in a training code of the yolo-v5s algorithm model, a new model file is obtained after training is completed, and the new model file is placed under a reading directory specified by the target detection module code. After training the target detection model, the preset threshold is adjusted, wherein the preset threshold includes a confidence threshold and a cross ratio threshold, and the adjustment method and the principle thereof are as described above and are not described herein.

Preferably, the method for labeling the training image comprises the following steps:

and respectively determining the number of pixels occupied by each object to be marked, and marking the object to be marked if the number of pixels is larger than a second preset number threshold value, so as to obtain a pre-marked training image.

In the embodiment, an original training image is obtained, an object to be marked in the original training image is marked, a pre-marked training image is obtained, and then a target detection model is input, so that training of the target detection model is achieved. The method comprises the steps that a plurality of original training images can be obtained, and objects to be marked corresponding to each preset category are ensured to be larger than a first preset quantity threshold value, so that the detection accuracy of the category target objects is improved; in addition, for objects with too small image, namely objects to be marked with the occupied pixel number smaller than the second preset number threshold, the processing can be omitted, and only the objects to be marked with the occupied pixel number reaching the second preset number are marked.

Fig. 4 shows a flow chart of another monocular camera ranging method, and the embodiment is applied to an electronic device including the following hardware: the focal length is 6 mm's camera, industrial computer, display, power etc.. The resolution of the 6mm camera is 1920 multiplied by 1080, and a USB3.0 interface is used for power supply and data transmission; configuration of the industrial personal computer, CPU: i7, GPU:2070Super (video memory 8G), RAM:32G.

As shown in fig. 4, the method comprises the steps of:

step 301: marking data;

in the step, labelImg labeling software is used for labeling, and in the labeling process, in order to reduce the probability of false detection as much as possible, small targets (for example, pixels occupied by the targets are less than 10) at a distance are not labeled; in order to prevent the targets from being missed, at least 100 images containing the targets are shot for labeling the targets in the same category.

Step 302: training a model;

in the step, a yolo-v5 algorithm model is adopted as a training model, and a yolo-v5s algorithm model with smaller body weight is adopted as an original neural network model. In order to speed up training so that the algorithm converges faster, it should theoretically be set as large as possible, and the batch_size (the amount of data required for updating the parameters once by the network) may be set to 32 in this embodiment due to the video memory limit of the video card.

Step 303: detecting a target;

in the step, firstly, copying a model obtained by training to a designated path; secondly, adjusting the size of part of parameters so that part of false detection targets can be filtered, wherein the two parameters, namely a confidence threshold value and a non-maximum suppression threshold value, are mainly used for adjustment, the confidence threshold value is set to be 0.65, and the non-maximum suppression threshold value is set to be 0.45; and finally, setting the default model type as a yolo-v5s algorithm model to ensure that the program calls relevant network parameters of the yolo-v5s algorithm model when executing.

Step 304: judging a grounding point;

in this step, as shown in fig. 5, four vertex coordinates of the target frame are respectively expressed as (x _A ,y _A ),(x _B ,y _B ),(x _C ,y _C ),(x _D ,y _D ) And the four vertex coordinates satisfy the following relationship:

y _A ＝y _B ，y _C ＝y _D ，x _A ＝x _D ，x _B ＝x _C

ground point P (x) _P ,y _P ) The calculation of (2) can be expressed as:

x _P ＝0.5*(x _C +x _D )

y _P ＝y _C ＝y _D

for example, assuming that four vertex coordinates of a target frame detected by a certain target are (124, 64), (232, 64), (124, 166), (232, 166), respectively, the ground point coordinates of the target are (178, 166).

Step 305: monocular ranging.

In this step, as shown in fig. 6, a camera P is installed on an unmanned vehicle a, a pedestrian B and a vehicle C are provided in front of the unmanned vehicle, HA is the installation height of the camera on the unmanned vehicle a, Z1 is the distance from the pedestrian B to the camera P, and Z2 is the distance from the vehicle C to the camera P; setting the focal length of the camera P as f; the upper left corner is an enlarged image of the interior of the camera P, y1 and y2 are images of the ground points of the pedestrian B and the vehicle C on the photosensitive element of the camera, respectively, and the coordinates of y1 are (x _B ,y _B ) Y2 has the coordinates (x) _C ,y _C ). The specific operation steps of executing the monocular ranging procedure are as follows:

step 3051: collecting height data of 100 groups of class vehicles and height data of 100 pedestrians, and measuring the average value of the height data as the true height of the class targets, wherein the height of the pedestrians is expressed as H _p The height of the vehicle is denoted as H _v ；

Step 3052: coordinates of the ground points of the pedestrian B and the vehicle C in the image can be calculated from the formula of the foregoing step 304;

step 3053: the relationship between the height of the pedestrian B and the vehicle C on the imaging plane and the target-to-camera distance can be obtained according to fig. 6 and the camera imaging principle;

a pedestrian:

vehicle: />

Step 3054: since the heights of the pedestrian B and the vehicle C on the imaging plane are known and the focal length of the camera can be obtained by the internal reference calibration or the camera manufacturer, the distances from the pedestrian B and the vehicle C to the camera can be obtained according to the formula of the step 3053:

distance of pedestrian to camera:

distance of vehicle to camera:

for example, in this embodiment, the focal length of the camera is 6mm, the average height of the pedestrian is 1.65 meters, the average height of the vehicle is 2 meters, and when detecting that the height of the pedestrian is 0.002 meters in the camera coordinate system, the height of the vehicle is 0.0015 meters in the camera coordinate system, it can be calculated that: the distance from the pedestrian to the camera is:

rice; the distance from the vehicle to the camera is: />

And (5) rice.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Further, as a specific implementation of the above-mentioned monocular camera ranging method, the embodiment of the present application provides a monocular camera ranging device, as shown in fig. 7, which includes: shooting module, detection module and operation module.

the operation module is used for determining the image height of the target object in the target image according to the two-dimensional coordinates of the target frame; and determining the target distance between the target object and the monocular camera according to the focal length of the monocular camera, the actual height of the target object and the image height.

In a specific application scenario, optionally, the detection module is configured to:

dividing the feature map into a preset number of sub-areas, and detecting each sub-area by utilizing a predefined detection frame to obtain at least one alternative frame and confidence corresponding to the alternative frame, wherein the alternative frame contains an alternative object;

and determining a target frame in the candidate frames according to a preset threshold value of the target detection model and the confidence level corresponding to the candidate frames, and determining candidate objects contained in the target frame as target objects.

In a specific application scenario, optionally, the preset threshold includes a confidence threshold and an intersection ratio threshold;

accordingly, the detection module is used for:

removing the candidate frames with the confidence coefficient smaller than the confidence coefficient threshold value from at least one candidate frame to obtain a candidate set;

determining an alternative frame with the highest confidence coefficient in the to-be-selected set as a base frame, removing the base frame from the to-be-selected set, and adding the base frame into the base frame set;

determining the intersection ratio between each alternative frame and the base frame in the candidate set, eliminating the alternative frames with the intersection ratio larger than the intersection ratio threshold value, and returning to the step of determining the alternative frame with the maximum confidence degree in the candidate set as the base frame until the candidate set is an empty set;

and determining an alternative frame in the base frame set as a target frame.

In a specific application scenario, optionally, the operation module is configured to:

In a specific application scenario, optionally, the apparatus further includes an initialization module, configured to:

correspondingly, the detection module is further configured to: determining a target class of a target object in a target detection model;

In a specific application scenario, optionally, the initialization module is configured to:

In a specific application scenario, optionally, the initialization module is further configured to:

It should be noted that, other corresponding descriptions of each functional module related to the monocular camera ranging device provided in the embodiments of the present application may refer to corresponding descriptions in the above method, and are not repeated herein.

Based on the above method, correspondingly, the embodiment of the application also provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the monocular camera ranging method described above.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing an electronic device (may be a personal computer, a server, or a network device, etc.) to perform the methods described in various implementation scenarios of the present application.

Based on the method shown in fig. 1 to 6 and the virtual device embodiment shown in fig. 7, in order to achieve the above object, the embodiment of the present application further provides an electronic device, which may specifically be a personal computer, a server, a network device, or the like, where the electronic device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the monocular camera ranging method as described above and shown in fig. 1 to 6.

Optionally, the electronic device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the structure of the electronic device provided in this embodiment is not limited to the electronic device, and may include more or fewer components, or may be combined with certain components, or may be arranged with different components.

The storage medium may also include an operating system, a network communication module. An operating system is a program that manages and saves electronic device hardware and software resources, supporting the execution of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among all the controls in the storage medium and communication with other hardware and software in the entity equipment.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware.

Those skilled in the art will appreciate that the drawings are merely schematic illustrations of one preferred implementation scenario, and that the elements or processes in the drawings are not necessarily required to practice the present application. Those skilled in the art will appreciate that elements of an apparatus in an implementation may be distributed throughout the apparatus in an implementation as described in the implementation, or that corresponding variations may be located in one or more apparatuses other than the present implementation. The units of the implementation scenario may be combined into one unit, or may be further split into a plurality of sub-units.

The foregoing application serial numbers are merely for description, and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely a few specific implementations of the present application, but the present application is not limited thereto and any variations that can be considered by a person skilled in the art shall fall within the protection scope of the present application.

Claims

1. A monocular camera ranging method, the method comprising:

2. The method according to claim 1, wherein the determining the target object in the target image and the target frame corresponding to the target object includes:

3. The method of claim 2, wherein the preset threshold comprises a confidence threshold and a cross-ratio threshold;

and determining an alternative frame in the base frame set as the target frame.

4. The method of claim 1, wherein determining the image height of the target object in the target image based on the two-dimensional coordinates of the target frame comprises:

5. The method according to claim 2, wherein the method further comprises:

6. The method of claim 5, wherein prior to said inputting the target image into a pre-set target detection model, the method comprises:

7. The method of claim 6, wherein prior to training the object detection model with the pre-labeled training image, the method further comprises:

8. A monocular camera ranging apparatus, the apparatus comprising:

9. A storage medium having stored thereon a program or instructions which, when executed by a processor, implement the method of any of claims 1 to 7.

10. An electronic device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the program.