CN113240750A - Three-dimensional space information measuring and calculating method and device - Google Patents

Three-dimensional space information measuring and calculating method and device Download PDF

Info

Publication number
CN113240750A
CN113240750A CN202110522849.5A CN202110522849A CN113240750A CN 113240750 A CN113240750 A CN 113240750A CN 202110522849 A CN202110522849 A CN 202110522849A CN 113240750 A CN113240750 A CN 113240750A
Authority
CN
China
Prior art keywords
target object
dimensional
coordinate system
dimensional information
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110522849.5A
Other languages
Chinese (zh)
Inventor
唐勇
梁晶晶
施小东
党诗芽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Shanghai ICT Co Ltd
CM Intelligent Mobility Network Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Shanghai ICT Co Ltd
CM Intelligent Mobility Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Shanghai ICT Co Ltd, CM Intelligent Mobility Network Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110522849.5A priority Critical patent/CN113240750A/en
Publication of CN113240750A publication Critical patent/CN113240750A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Abstract

The application discloses a three-dimensional space information measuring and calculating method and device, and belongs to the technical field of computer vision. The three-dimensional space information measuring and calculating method comprises the following steps: acquiring three-dimensional information of a target object in image data acquired by a monocular camera under a camera coordinate system; and converting the three-dimensional information of the target object in the camera coordinate system into the three-dimensional information in the world coordinate system according to the position and the deflection angle of the monocular camera. According to the scheme, the three-dimensional information of the target object in the actual space is obtained by utilizing the position and the deflection angle of the monocular camera, and the application scene of the three-dimensional space information measuring and calculating mode is enriched.

Description

Three-dimensional space information measuring and calculating method and device
Technical Field
The application belongs to the field of computer vision, and particularly relates to a three-dimensional space information measuring and calculating method and device.
Background
In the prior art, a monocular camera and a depth learning algorithm are used for measuring and calculating three-dimensional space information of a target object, only the relative position of the target object, namely the position of a reference object by using the camera, can be obtained, and the actual position of the reference object is not obtained.
Disclosure of Invention
The embodiment of the application provides a method and a device for measuring and calculating three-dimensional spatial information, which can solve the problems that the conventional three-dimensional spatial information measuring and calculating mode can only acquire the relative position of a target object and the application scene is limited.
In order to solve the above technical problem, an embodiment of the present application provides a three-dimensional spatial information measurement method, including:
acquiring three-dimensional information of a target object in image data acquired by a monocular camera under a camera coordinate system;
and converting the three-dimensional information of the target object in the camera coordinate system into three-dimensional information in a world coordinate system according to the position and the deflection angle of the monocular camera.
Optionally, the deflection angle is a clockwise deflection angle based on a true north direction.
Optionally, the converting the three-dimensional information of the object in the camera coordinate system into the three-dimensional information in the world coordinate system according to the position and the deflection angle of the monocular camera includes:
determining the distance between the target object and the monocular camera according to the three-dimensional information of the target object in the camera coordinate system;
determining three-dimensional information of the target object in a world coordinate system according to the position of the monocular camera and the distance;
and correcting the three-dimensional information of the target object in a world coordinate system according to the deflection angle of the monocular camera.
Optionally, the three-dimensional information includes: coordinates of the target object, size of the target object, deflection angle of the target object, and velocity of the target object.
Optionally, the determining a distance between the target object and the monocular camera according to three-dimensional information of the target object in the camera coordinate system includes:
and determining the distance between the target object and the monocular camera according to the coordinates of the target object.
Optionally, the acquiring three-dimensional information of the target object in the image data acquired by the monocular camera under the camera coordinate system includes:
acquiring image data acquired by a monocular camera;
acquiring a two-dimensional recognition result of a target object in the image according to the image data;
and calculating the three-dimensional information of the identified target object according to the two-dimensional identification result of the target object, and acquiring the three-dimensional information of the target object in a camera coordinate system.
Optionally, the acquiring image data collected by the monocular camera includes:
calibrating a monocular camera to obtain an internal reference matrix of the monocular camera;
and carrying out distortion correction on the video stream data acquired by the monocular camera according to the internal reference matrix to acquire image data.
Optionally, the obtaining a two-dimensional recognition result of the target object in the image according to the image data includes:
acquiring a target object identification model;
inputting the image data into the target object recognition model to obtain a two-dimensional recognition result of the target object;
wherein the two-dimensional recognition result comprises: object type, offset and size of the object.
Optionally, the obtaining the target recognition model includes:
intercepting an original image data set from image data acquired by the monocular camera;
determining a training set from the raw image dataset;
labeling a target object of the image data in the training set to obtain a first image data set with a label;
inputting the first image data set into a full convolution network to obtain a key point thermodynamic diagram;
detecting a target object according to the key points in the key point thermodynamic diagram and outputting a detection result;
and updating the model parameters of the target object identification model according to the label of the first image data and the detection result to obtain the target object identification model.
Optionally, the calculating three-dimensional information of the identified target object according to the two-dimensional identification result of the target object to obtain the three-dimensional information of the target object in the camera coordinate system includes:
inputting a two-dimensional recognition result of a target object into a 3D estimation network model, adding a depth value, a 3D size and a direction to a central point of the target object through the 3D estimation network model, and obtaining three-dimensional information of each target object in an image under a camera coordinate system through a depth calculation channel, wherein the depth calculation channel comprises: two convolution layers, a modified linear unit ReLU activation function and a regression loss function;
wherein the three-dimensional information in the camera coordinate system comprises: coordinates of the object, size of the object, deflection angle of the object.
Optionally, the three-dimensional information further includes: the speed of the target; the three-dimensional information measurement and calculation is carried out on the identified target object according to the two-dimensional identification result of the target object, and the three-dimensional information of the target object under a camera coordinate system is obtained, and the method further comprises the following steps:
tracking a target object in real time;
and determining the speed of the target object according to the positions of the target object at different moments.
The embodiment of the present application further provides a three-dimensional spatial information measuring device, including:
the acquisition module is used for acquiring three-dimensional information of a target object in image data acquired by the monocular camera under a camera coordinate system;
and the conversion module is used for converting the three-dimensional information of the target object in the camera coordinate system into the three-dimensional information in the world coordinate system according to the position and the deflection angle of the monocular camera.
The embodiment of the application also provides a three-dimensional space information measuring and calculating device, which comprises a transceiver and a processor;
the processor is configured to:
acquiring three-dimensional information of a target object in image data acquired by a monocular camera under a camera coordinate system;
and converting the three-dimensional information of the target object in the camera coordinate system into three-dimensional information in a world coordinate system according to the position and the deflection angle of the monocular camera.
The embodiment of the present application further provides a three-dimensional spatial information measuring and calculating device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and is characterized in that the processor implements the steps of the three-dimensional spatial information measuring and calculating method when executing the program.
The embodiment of the present application further provides a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the three-dimensional spatial information measuring and calculating method.
The beneficial effect of this application is:
according to the scheme, the position and the deflection angle of the monocular camera are utilized to acquire the three-dimensional information of the target object in the image data acquired by the monocular camera under the world coordinate system, the three-dimensional information of the target object in the actual space can be acquired, and the application scene of the three-dimensional space information measuring and calculating mode is enriched.
Drawings
Fig. 1 is a schematic flow chart of a three-dimensional spatial information measurement method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a detailed implementation of a three-dimensional spatial information measurement method according to an embodiment of the present application;
fig. 3 is a schematic block diagram of a three-dimensional spatial information measuring device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The three-dimensional space information measuring and calculating method and device provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
As shown in fig. 1, an embodiment of the present application provides a three-dimensional space information measuring and calculating method applied to a three-dimensional space information measuring and calculating device, including:
step 11, acquiring three-dimensional information of a target object in image data acquired by a monocular camera under a camera coordinate system;
the target object is at least one object, for example, a vehicle is used as the target object, a pedestrian is used as the target object, or both the vehicle and the pedestrian are used as the target object; the three-dimensional information of the target object in the camera coordinate system refers to: three-dimensional information of the target object under the world coordinate origin taken by the camera, namely relative three-dimensional information of the target object.
Step 12, converting the three-dimensional information of the target object in the camera coordinate system into three-dimensional information in a world coordinate system according to the position and the deflection angle of the monocular camera;
the three-dimensional information of the target object in the world coordinate system means actual three-dimensional information of the target object.
It should be noted that, optionally, the position of the monocular camera mainly refers to coordinate position information of the monocular camera acquired by using a Global Positioning System (GPS); the deflection angle refers to a clockwise deflection angle based on the true north direction.
According to the embodiment of the application, the three-dimensional information of the target object acquired by the monocular camera is converted into the world coordinate system from the camera coordinate system by utilizing the position and the deflection angle of the monocular camera, so that the three-dimensional information of the target object in the actual space can be acquired, and the application scene of the three-dimensional space information measuring and calculating mode is enriched.
Optionally, an optional implementation manner of step 11 in the embodiment of the present application is:
step 111, acquiring image data acquired by a monocular camera;
optionally, this step may be implemented as:
1111, calibrating a monocular camera to obtain an internal reference matrix of the monocular camera;
it should be noted that the reference matrix mainly refers to the reference parameters and distortion parameters of the monocular camera.
Step 1112, performing distortion correction on video stream data acquired by the monocular camera according to the internal reference matrix to acquire image data;
the video stream data mainly refers to the video stream data of the road picture captured by the monocular camera at the road side end. The monocular camera at the road side end is used for coding the video in an H.265 coding format based on a real-time stream transmission protocol (Rtsp) through real-time shooting and transmitting the video to the three-dimensional spatial information measuring and calculating device, and the three-dimensional spatial information measuring and calculating device is used for acquiring video stream data of a road picture; after the video stream data is acquired, image data that can be used for subsequent processing is acquired by performing radial and longitudinal distortion correction on an image in the video stream data (the video stream data is composed of a plurality of images) by using the internal reference matrix.
Step 112, acquiring a two-dimensional recognition result of the target object in the image according to the image data;
it should be noted that, one implementation manner here is:
step 1121, obtaining a target object identification model;
step 1122, inputting the image data into the target object recognition model to obtain a two-dimensional recognition result of the target object;
wherein the two-dimensional recognition result comprises: the object type of the object, the offset (i.e., the offset of the object in the image in the X-axis and Y-axis), and the size (i.e., the height and width of the object in the image).
It should be noted that, here, the target object recognition model is directly used to recognize the target object in the image data, and further, an optional obtaining manner of the target object recognition model is as follows:
step 11211, intercepting an original image data set from the image data collected by the monocular camera;
it should be noted that one implementation manner that can be adopted here is: and intercepting a road image data set from the corrected image data by using a frame extraction method based on the image data subjected to distortion correction by the internal reference matrix, wherein the road image data set is an original image data set.
Step 11212, determining a training set from the original image data set;
it should be noted that this step is mainly to obtain a training set from the original image data set, that is, the training set is a part of the original image data set.
Optionally, in order to ensure that the finally obtained target recognition model is accurate, the target recognition model generally needs to be tested after the target recognition model is obtained, that is, when the original image data set is divided, the original image data set may be divided into a training set and a test set, for example, the original image data set may be divided according to a preset ratio, one part is the training set for training the target recognition model, and the other part is the test set for optimizing and adjusting the trained target recognition model. For example, in the embodiment of the present application, 7: a scale of 3 divides the original image data set into a training set and a test set.
Step 11213, labeling the target object of the image data in the training set to obtain a first image data set with labels;
it should be noted that, an alternative implementation manner of this step is: labeling the target object in the image of the training set by using a picture labeling (labelimage) tool, and generating a labeled image data set (it should be noted that, in the present application, a pre-training model based on nuScenes data is used, so that only part of data needs to be labeled in the labeling process for a specific environment, and the model training is finely tuned based on the data).
Step 11214, inputting the first image data set into a full convolution network to obtain a key point thermodynamic diagram;
it should be noted that, this step is to process the input image I to obtain a keypoint thermodynamic diagram, where each keypoint is represented by formula one as:
formula I,
Figure BDA0003064744030000071
Wherein I ∈ RW×H×3I is the first image data set (i.e. the input image), W is the width of the image, H is the height of the image,
Figure BDA0003064744030000072
is key toPoint, R is output size scaling (stride), and C is the number of types of key points (i.e. output feature map channels, i.e. target categories), for example, in this embodiment, C is 80 for target detection; alternatively,
Figure BDA0003064744030000073
it is indicated that the detected is a key point,
Figure BDA0003064744030000074
indicating that background was detected.
Step 11215, detecting a target object according to the key points in the key point thermodynamic diagram and outputting a detection result;
the step is to utilize the key points to carry out target detection, and order:
Figure BDA0003064744030000075
a bounding box (Bbox) for object k (i.e. object class k), whose center position is:
Figure BDA0003064744030000076
while regressing the size of the target for each target k based on the center position (S)k) I.e. by
Figure BDA0003064744030000077
The entire network prediction will output C +4 values (i.e., keypoint class C, offset X (i.e., X-axis offset), Y (i.e., Y-axis offset), size w (width), h (height)) at each position, while all outputs share a fully-convolutional neural network model (backoff).
Step 11216, updating the model parameters of the object recognition model according to the label of the first image data and the detection result to obtain an object recognition model;
after the detection result is obtained, the model parameters need to be updated by using the detection result and the label to obtain the final target recognition model.
After the target object identification model is obtained, real-time video stream data are transmitted to the target object identification model, the target object in the target object identification model is detected, the central position of each object in the image is obtained, model parameters and a network structure are modified and optimized through a test set, and finally an effective target object identification model is constructed.
It should be noted that, in the embodiment of the present application, a target object recognition model is trained by a deep learning network based on a labeled first image dataset combined target detection network (centret), target recognition is converted into a standard key point estimation problem, an image is transmitted into a full convolution network to obtain a thermodynamic diagram, where a peak point of the thermodynamic diagram is a central point, and a width and a height of a target are predicted for each peak point position of the thermodynamic diagram, so that a non-maximum suppression (NMS) operation required when a target is recognized by using a Bbox is effectively solved, that is, a duplicate detection box of the same target is deleted by calculating an overlap degree (IOU) between the bboxs.
113, calculating three-dimensional information of the identified target object according to a two-dimensional identification result of the target object, and acquiring the three-dimensional information of the target object in a camera coordinate system;
it should be noted that one implementation manner that may be adopted in this step is: inputting a two-dimensional recognition result of the target object into a 3D estimation network model, adding a depth value, a 3D size and a direction to a central point of the target object through the 3D estimation network model, and obtaining three-dimensional information of each target object in the image under a camera coordinate system through a depth calculation channel;
wherein the depth calculation channel comprises: two convolutional layers, a modified linear unit (ReLU) activation function, and a regression loss (L1 loss) function.
Through the above operation, the coordinates of the object (the coordinate (X) parallel to the camera lens direction, the coordinate (Y) perpendicular to the ground plane direction, and the camera lens direction (Z)) in the three-dimensional information in the camera coordinate system, the size of the object (the length (L), the width (W), and the height (H) of the object), and the deflection angle of the object (i.e., the clockwise deflection angle perpendicular to the camera direction) can be obtained.
Further, in order to obtain the speed (Velocity: unit m/s) of the target object in the three-dimensional information under the camera coordinate system; an optional implementation manner of the embodiment of the present application is: tracking a target object in real time; and determining the speed of the target object according to the positions of the target object at different moments.
For the three-dimensional information measurement and calculation, refer to the Deep3Dbox method, that is, the network model is used to detect the detection results of the center point type C (i.e. the recognized object type) and the offset x, y, and the size w, h of the 2D object, and then the detection result of the 2D object is sent to the 3D estimation network model to output the three-dimensional space information of the object. Wherein, the 3D detection of the center point of each target object is to perform three-dimensional bbox estimation thereon, and each center point requires 3 additional information: depth (depth value), 3D-dimension (3D size), orientation (direction) (each piece of information adds a head separately). The depth value depth for each center point is a dimension, and is expressed by the formula two:
the second formula,
Figure BDA0003064744030000091
Wherein d is a depth value,
Figure BDA0003064744030000092
for the predicted depth value, sigma is a sigmoid function, a depth calculation channel head is added on the 3D estimation network model,
Figure BDA0003064744030000093
Figure BDA0003064744030000094
for the prediction value of the depth information of all points on the current channel, the depth calculation channel head uses two convolution layers, and uses the ReLU activation function and L1 loss to train the depth estimator in the depth calculation channel head, and for the clockwise deflection angle of the camera based on the north direction, the above method is also adopted to adopt a separate head for regression. At the same time, an independent head directly used for the 3D dimension of the target regresses them (length, width, height)The absolute value (unit: meter),
Figure BDA0003064744030000095
Figure BDA0003064744030000096
is a predicted three-dimensional (length, width and height) information value.
And for the speed of the target object, predicting the position information of the target object in a two-dimensional image by using Kalman filtering, acquiring the position information of the object at the next moment, determining the real-time ID of the object, verifying the tracked object through the three-dimensional space information of the object, thereby verifying and adjusting the ID, ensuring the tracking accuracy, calculating the moving distance of the target object through the position information of the object between the front frame and the rear frame after the position of the target object under a camera coordinate system is acquired, and calculating the speed of the target object through the time of each frame.
Through the process, the acquired two-dimensional information of the target object can be effectively converted into three-dimensional information, wherein the three-dimensional information comprises the three-dimensional size of the target object, the coordinates of the target object under the world coordinate origin, the deflection angle of the target object and the speed of the target object.
It should be further noted that, an optional implementation manner of step 12 is:
step 121, determining the distance between the target object and the monocular camera according to the three-dimensional information of the target object in the camera coordinate system;
specifically, one implementation manner that may be adopted in this step is:
and determining the distance between the target object and the monocular camera according to the coordinates of the target object.
Here, the distance between the target object and the monocular camera is acquired in the camera coordinate system, and the monocular camera is an origin of the coordinate system, and the distance between the target object and the origin can be calculated from the coordinates of the target object by combining the known coordinates of the target object, that is, the distance between the target object and the monocular camera can be known.
Step 122, determining three-dimensional information of the target object in a world coordinate system according to the position of the monocular camera and the distance;
in this case, the coordinates of the object in the camera coordinate system are mainly converted into the coordinates in the world coordinate system, that is, the coordinates of the object in the camera coordinate system are converted into the coordinates in the different coordinate systems.
And 123, correcting the three-dimensional information of the target object in a world coordinate system according to the deflection angle of the monocular camera.
In the process, the GPS of the camera and the clockwise deflection angle of the camera based on the due north direction are used to perform coordinate transformation on the identified target object, the actual GPS of the target object (i.e., the actual coordinates of the target object) is calculated according to the GPS of the camera and the distance between the target object and the camera, and the actual deflection angle of the target object is corrected according to the deflection angle of the camera to obtain the deflection angle of the target object in the actual world coordinate system (clockwise rotation with due north as the starting angle).
As shown in fig. 2, a detailed description of a specific implementation of the embodiment of the present application is provided below.
Step 21, calibrating a roadside camera;
the method mainly comprises the steps of calibrating the monocular camera, and acquiring an internal reference matrix of the camera, wherein the internal reference matrix comprises internal reference parameters and distortion parameters of the camera; the GPS of the camera and the clockwise deflection angle of the camera based on the true north direction (clockwise rotation in the true north direction) are acquired simultaneously.
Step 22, acquiring monocular camera data;
the method comprises the steps that through real-time shooting of a road side end camera, a video is coded and transmitted in an H.265 coding format based on Rtsp, and therefore video stream data of a road picture are obtained; and correcting the radial and longitudinal distortion of the image according to the internal reference matrix of the camera for the acquired video stream data.
Step 23, labeling the target object;
based on the corrected video stream data obtained in step 22, intercepting a road image data set from the corrected video stream data in a frame extraction manner, and calculating a ratio of (7): 3, dividing a training set and a test set in proportion; and labeling the target object in the training set image by using a labelimage labeling tool to generate a labeled image data set.
It should be noted that this step is a step specific to the initial generation of the object recognition model, and is not required to be executed after there is already an object recognition model that can be applied.
Step 24, identifying the target object;
training a target object recognition model based on the image data set and the CenterNet deep learning network, transmitting real-time video stream data to the target object recognition model after the target object recognition model is obtained, detecting the target object in the target object recognition model, obtaining the central position of each object in the image, modifying and optimizing model parameters and a network structure through a test set, and finally constructing an effective target object recognition model.
Step 25, measuring and calculating target three-dimensional information;
the target recognition result (which is the two-dimensional recognition result of the object and includes the center point type C (i.e., the recognized object type), the offset and the size) obtained in step 24 is measured and calculated with respect to three-dimensional information, wherein the three-dimensional information mainly includes the three-dimensional size of the target object, the coordinates of the target object, the deflection angle of the target object and the velocity of the target object.
It should be noted that, during the execution of step 24, the target object needs to be tracked in real time to obtain the speed of the target object.
Step 26, converting coordinates of the target object;
after various data of the target object under the camera coordinate system are acquired based on the step 25, coordinate conversion is performed on the identified target object through the camera GPS acquired in the step 21 and the clockwise deflection angle of the camera based on the due north direction, the actual GPS of the target object is calculated according to the GPS of the camera and the distance between the target object and the camera, and the actual deflection angle of the target object is corrected according to the deflection angle of the camera to acquire the deflection angle of the target object under the actual world coordinate system (clockwise rotation is performed by taking due north as an initial angle).
In summary, the embodiment of the present application can achieve the following technical effects:
1. according to the method and the device, the coordinate marking of the vertexes of all the data is not required, and only a small amount of data of a specific environment is required to be marked;
2. according to the method and the device, external reference of the camera is not required to be calibrated, so that the influence of the external reference on the result can be reduced, and meanwhile, the labor cost is reduced;
3. the video stream can be processed in real time, a plurality of images can be subjected to target recognition, and the use scene is expanded;
4. according to the method and the device, besides the three-dimensional space information, various three-dimensional data of the object in the actual space can be effectively acquired, and the application scene is wider.
It should be noted that, in the three-dimensional space information measuring and calculating method provided in the embodiment of the present application, the execution main body may be a three-dimensional space information measuring and calculating device, or a control module in the three-dimensional space information measuring and calculating device for executing the loaded three-dimensional space information measuring and calculating method. In the embodiment of the present application, a three-dimensional spatial information measuring and calculating method implemented by a three-dimensional spatial information measuring and calculating device is taken as an example to describe the three-dimensional spatial information measuring and calculating method provided in the embodiment of the present application.
As shown in fig. 3, an embodiment of the present application further provides a three-dimensional spatial information measuring device 30, including:
the acquiring module 31 is configured to acquire three-dimensional information of a target object in image data acquired by a monocular camera in a camera coordinate system;
and the conversion module 32 is configured to convert the three-dimensional information of the target object in the camera coordinate system into three-dimensional information in a world coordinate system according to the position and the deflection angle of the monocular camera.
Optionally, the deflection angle is a clockwise deflection angle based on a true north direction.
Optionally, the conversion module 32 includes:
the first acquisition unit is used for determining the distance between the target object and the monocular camera according to the three-dimensional information of the target object in the camera coordinate system;
the first determining unit is used for determining three-dimensional information of the target object in a world coordinate system according to the position of the monocular camera and the distance;
and the correcting unit is used for correcting the three-dimensional information of the target object in a world coordinate system according to the deflection angle of the monocular camera.
Optionally, the three-dimensional information includes: coordinates of the target object, size of the target object, deflection angle of the target object, and velocity of the target object.
Optionally, the first obtaining unit is configured to:
and determining the distance between the target object and the monocular camera according to the coordinates of the target object.
Optionally, the obtaining module 31 includes:
the second acquisition unit is used for acquiring image data acquired by the monocular camera;
the third acquisition unit is used for acquiring a two-dimensional recognition result of the target object in the image according to the image data;
and the fourth acquisition unit is used for measuring and calculating the three-dimensional information of the identified target object according to the two-dimensional identification result of the target object and acquiring the three-dimensional information of the target object in a camera coordinate system.
Optionally, the second obtaining unit specifically includes:
the first acquisition subunit is used for calibrating the monocular camera and acquiring an internal reference matrix of the monocular camera;
and the second acquisition subunit is used for carrying out distortion correction on the video stream data acquired by the monocular camera according to the internal reference matrix to acquire image data.
Optionally, the third obtaining unit includes:
the third acquisition subunit is used for acquiring a target object recognition model;
the fourth acquisition subunit is used for inputting the image data into the target object recognition model and acquiring a two-dimensional recognition result of the target object;
wherein the two-dimensional recognition result comprises: object type, offset and size of the object.
Optionally, the third obtaining subunit is specifically configured to:
intercepting an original image data set from image data acquired by the monocular camera;
determining a training set from the raw image dataset;
labeling a target object of the image data in the training set to obtain a first image data set with a label;
inputting the first image data set into a full convolution network to obtain a key point thermodynamic diagram;
detecting a target object according to the key points in the key point thermodynamic diagram and outputting a detection result;
and updating the model parameters of the target object identification model according to the label of the first image data and the detection result to obtain the target object identification model.
Optionally, the fourth obtaining unit is specifically configured to:
inputting a two-dimensional recognition result of a target object into a 3D estimation network model, adding a depth value, a 3D size and a direction to a central point of the target object through the 3D estimation network model, and obtaining three-dimensional information of each target object in an image under a camera coordinate system through a depth calculation channel, wherein the depth calculation channel comprises: two convolution layers, a modified linear unit ReLU activation function and a regression loss function;
wherein the three-dimensional information in the camera coordinate system comprises: coordinates of the object, size of the object, deflection angle of the object.
Optionally, the three-dimensional information further includes: the speed of the target; the fourth obtaining unit is further configured to:
tracking a target object in real time;
and determining the speed of the target object according to the positions of the target object at different moments.
It should be noted that the three-dimensional spatial information measuring and calculating device provided in the embodiment of the present application can implement each process implemented by the three-dimensional spatial information measuring and calculating method in the method embodiment of fig. 1, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The three-dimensional space information measuring and calculating device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The embodiment of the invention also provides a three-dimensional space information measuring and calculating device, which comprises a transceiver and a processor;
the processor is configured to:
acquiring three-dimensional information of a target object in image data acquired by a monocular camera under a camera coordinate system;
and converting the three-dimensional information of the target object in the camera coordinate system into three-dimensional information in a world coordinate system according to the position and the deflection angle of the monocular camera.
Optionally, the deflection angle is a clockwise deflection angle based on a true north direction.
Optionally, the processor is further configured to implement:
determining the distance between the target object and the monocular camera according to the three-dimensional information of the target object in the camera coordinate system;
determining three-dimensional information of the target object in a world coordinate system according to the position of the monocular camera and the distance;
and correcting the three-dimensional information of the target object in a world coordinate system according to the deflection angle of the monocular camera.
Optionally, the three-dimensional information includes: coordinates of the target object, size of the target object, deflection angle of the target object, and velocity of the target object.
Optionally, the processor is further configured to implement:
and determining the distance between the target object and the monocular camera according to the coordinates of the target object.
Optionally, the processor is further configured to implement:
acquiring image data acquired by a monocular camera;
acquiring a two-dimensional recognition result of a target object in the image according to the image data;
and calculating the three-dimensional information of the identified target object according to the two-dimensional identification result of the target object, and acquiring the three-dimensional information of the target object in a camera coordinate system.
Optionally, the processor is further configured to implement:
calibrating a monocular camera to obtain an internal reference matrix of the monocular camera;
and carrying out distortion correction on the video stream data acquired by the monocular camera according to the internal reference matrix to acquire image data.
Optionally, the processor is further configured to implement:
acquiring a target object identification model;
inputting the image data into the target object recognition model to obtain a two-dimensional recognition result of the target object;
wherein the two-dimensional recognition result comprises: object type, offset and size of the object.
Optionally, the processor is further configured to implement:
intercepting an original image data set from image data acquired by the monocular camera;
determining a training set from the raw image dataset;
labeling a target object of the image data in the training set to obtain a first image data set with a label;
inputting the first image data set into a full convolution network to obtain a key point thermodynamic diagram;
detecting a target object according to the key points in the key point thermodynamic diagram and outputting a detection result;
and updating the model parameters of the target object identification model according to the label of the first image data and the detection result to obtain the target object identification model.
Optionally, the processor is further configured to implement:
inputting a two-dimensional recognition result of a target object into a 3D estimation network model, adding a depth value, a 3D size and a direction to a central point of the target object through the 3D estimation network model, and obtaining three-dimensional information of each target object in an image under a camera coordinate system through a depth calculation channel, wherein the depth calculation channel comprises: two convolution layers, a modified linear unit ReLU activation function and a regression loss function;
wherein the three-dimensional information in the camera coordinate system comprises: coordinates of the object, size of the object, deflection angle of the object.
Optionally, the three-dimensional information further includes: the speed of the target; the processor is further configured to implement:
tracking a target object in real time;
and determining the speed of the target object according to the positions of the target object at different moments.
The embodiment of the invention also provides a three-dimensional space information measuring and calculating device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of the three-dimensional space information measuring and calculating method when executing the program.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements each process in the foregoing three-dimensional spatial information measuring and calculating method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (15)

1. A three-dimensional space information measuring and calculating method is characterized by comprising the following steps:
acquiring three-dimensional information of a target object in image data acquired by a monocular camera under a camera coordinate system;
and converting the three-dimensional information of the target object in the camera coordinate system into three-dimensional information in a world coordinate system according to the position and the deflection angle of the monocular camera.
2. The method of claim 1, wherein the deflection angle is a clockwise deflection angle based on true north.
3. The method according to claim 1, wherein the converting the three-dimensional information of the object in the camera coordinate system into the three-dimensional information in the world coordinate system according to the position and the deflection angle of the monocular camera comprises:
determining the distance between the target object and the monocular camera according to the three-dimensional information of the target object in the camera coordinate system;
determining three-dimensional information of the target object in a world coordinate system according to the position of the monocular camera and the distance;
and correcting the three-dimensional information of the target object in a world coordinate system according to the deflection angle of the monocular camera.
4. The method of claim 3, wherein the three-dimensional information comprises: coordinates of the target object, size of the target object, deflection angle of the target object, and velocity of the target object.
5. The method of claim 4, wherein determining the distance between the object and the monocular camera according to the three-dimensional information of the object in the camera coordinate system comprises:
and determining the distance between the target object and the monocular camera according to the coordinates of the target object.
6. The method of claim 1, wherein the acquiring three-dimensional information of the object in the image data acquired by the monocular camera in the camera coordinate system comprises:
acquiring image data acquired by a monocular camera;
acquiring a two-dimensional recognition result of a target object in the image according to the image data;
and calculating the three-dimensional information of the identified target object according to the two-dimensional identification result of the target object, and acquiring the three-dimensional information of the target object in a camera coordinate system.
7. The method of claim 6, wherein said obtaining image data captured by a monocular camera comprises:
calibrating a monocular camera to obtain an internal reference matrix of the monocular camera;
and carrying out distortion correction on the video stream data acquired by the monocular camera according to the internal reference matrix to acquire image data.
8. The method of claim 6, wherein obtaining a two-dimensional recognition result of the object in the image from the image data comprises:
acquiring a target object identification model;
inputting the image data into the target object recognition model to obtain a two-dimensional recognition result of the target object;
wherein the two-dimensional recognition result comprises: object type, offset and size of the object.
9. The method of claim 8, wherein the obtaining the object recognition model comprises:
intercepting an original image data set from image data acquired by the monocular camera;
determining a training set from the raw image dataset;
labeling a target object of the image data in the training set to obtain a first image data set with a label;
inputting the first image data set into a full convolution network to obtain a key point thermodynamic diagram;
detecting a target object according to the key points in the key point thermodynamic diagram and outputting a detection result;
and updating the model parameters of the target object identification model according to the label of the first image data and the detection result to obtain the target object identification model.
10. The method according to claim 6, wherein the calculating of the three-dimensional information of the recognized target object according to the two-dimensional recognition result of the target object to obtain the three-dimensional information of the target object in the camera coordinate system comprises:
inputting a two-dimensional recognition result of a target object into a 3D estimation network model, adding a depth value, a 3D size and a direction to a central point of the target object through the 3D estimation network model, and obtaining three-dimensional information of each target object in an image under a camera coordinate system through a depth calculation channel, wherein the depth calculation channel comprises: two convolution layers, a modified linear unit ReLU activation function and a regression loss function;
wherein the three-dimensional information in the camera coordinate system comprises: coordinates of the object, size of the object, deflection angle of the object.
11. The method of claim 10, wherein the three-dimensional information further comprises: the speed of the target; the three-dimensional information measurement and calculation is carried out on the identified target object according to the two-dimensional identification result of the target object, and the three-dimensional information of the target object under a camera coordinate system is obtained, and the method further comprises the following steps:
tracking a target object in real time;
and determining the speed of the target object according to the positions of the target object at different moments.
12. A three-dimensional space information measuring and calculating device is characterized by comprising:
the acquisition module is used for acquiring three-dimensional information of a target object in image data acquired by the monocular camera under a camera coordinate system;
and the conversion module is used for converting the three-dimensional information of the target object in the camera coordinate system into the three-dimensional information in the world coordinate system according to the position and the deflection angle of the monocular camera.
13. A three-dimensional space information measuring and calculating device is characterized by comprising a transceiver and a processor;
the processor is configured to:
acquiring three-dimensional information of a target object in image data acquired by a monocular camera under a camera coordinate system;
and converting the three-dimensional information of the target object in the camera coordinate system into three-dimensional information in a world coordinate system according to the position and the deflection angle of the monocular camera.
14. A three-dimensional spatial information estimation device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the three-dimensional spatial information estimation method according to any one of claims 1 to 11.
15. A readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps in the three-dimensional spatial information estimation method according to any one of claims 1 to 11.
CN202110522849.5A 2021-05-13 2021-05-13 Three-dimensional space information measuring and calculating method and device Pending CN113240750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110522849.5A CN113240750A (en) 2021-05-13 2021-05-13 Three-dimensional space information measuring and calculating method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110522849.5A CN113240750A (en) 2021-05-13 2021-05-13 Three-dimensional space information measuring and calculating method and device

Publications (1)

Publication Number Publication Date
CN113240750A true CN113240750A (en) 2021-08-10

Family

ID=77134014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110522849.5A Pending CN113240750A (en) 2021-05-13 2021-05-13 Three-dimensional space information measuring and calculating method and device

Country Status (1)

Country Link
CN (1) CN113240750A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880470A (en) * 2023-03-08 2023-03-31 深圳佑驾创新科技有限公司 Method, device and equipment for generating 3D image data and storage medium
CN117102725A (en) * 2023-10-25 2023-11-24 湖南大学 Welding method and system for steel-concrete combined structure connecting piece

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063301A (en) * 2018-07-24 2018-12-21 杭州师范大学 Gestures of object estimation method in a kind of single image room based on thermodynamic chart
CN110390258A (en) * 2019-06-05 2019-10-29 东南大学 Image object three-dimensional information mask method
CN110427797A (en) * 2019-05-28 2019-11-08 东南大学 A kind of three-dimensional vehicle detection method based on geometrical condition limitation
WO2019233286A1 (en) * 2018-06-05 2019-12-12 北京市商汤科技开发有限公司 Visual positioning method and apparatus, electronic device and system
CN110689008A (en) * 2019-09-17 2020-01-14 大连理工大学 Monocular image-oriented three-dimensional object detection method based on three-dimensional reconstruction
CN111964680A (en) * 2020-07-29 2020-11-20 中国安全生产科学研究院 Real-time positioning method of inspection robot
CN112184914A (en) * 2020-10-27 2021-01-05 北京百度网讯科技有限公司 Method and device for determining three-dimensional position of target object and road side equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019233286A1 (en) * 2018-06-05 2019-12-12 北京市商汤科技开发有限公司 Visual positioning method and apparatus, electronic device and system
CN109063301A (en) * 2018-07-24 2018-12-21 杭州师范大学 Gestures of object estimation method in a kind of single image room based on thermodynamic chart
CN110427797A (en) * 2019-05-28 2019-11-08 东南大学 A kind of three-dimensional vehicle detection method based on geometrical condition limitation
CN110390258A (en) * 2019-06-05 2019-10-29 东南大学 Image object three-dimensional information mask method
CN110689008A (en) * 2019-09-17 2020-01-14 大连理工大学 Monocular image-oriented three-dimensional object detection method based on three-dimensional reconstruction
CN111964680A (en) * 2020-07-29 2020-11-20 中国安全生产科学研究院 Real-time positioning method of inspection robot
CN112184914A (en) * 2020-10-27 2021-01-05 北京百度网讯科技有限公司 Method and device for determining three-dimensional position of target object and road side equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880470A (en) * 2023-03-08 2023-03-31 深圳佑驾创新科技有限公司 Method, device and equipment for generating 3D image data and storage medium
CN117102725A (en) * 2023-10-25 2023-11-24 湖南大学 Welding method and system for steel-concrete combined structure connecting piece
CN117102725B (en) * 2023-10-25 2024-01-09 湖南大学 Welding method and system for steel-concrete combined structure connecting piece

Similar Documents

Publication Publication Date Title
CA3028653C (en) Methods and systems for color point cloud generation
CN107481292B (en) Attitude error estimation method and device for vehicle-mounted camera
CN109523597B (en) Method and device for calibrating external parameters of camera
CN111830953B (en) Vehicle self-positioning method, device and system
CN108921925B (en) Semantic point cloud generation method and device based on laser radar and visual fusion
KR102249769B1 (en) Estimation method of 3D coordinate value for each pixel of 2D image and autonomous driving information estimation method using the same
CN111340797A (en) Laser radar and binocular camera data fusion detection method and system
AU2018286592A1 (en) Systems and methods for correcting a high-definition map based on detection of obstructing objects
CN114359181B (en) Intelligent traffic target fusion detection method and system based on image and point cloud
CN105335955A (en) Object detection method and object detection apparatus
CN113029128B (en) Visual navigation method and related device, mobile terminal and storage medium
CN113240750A (en) Three-dimensional space information measuring and calculating method and device
KR20140054710A (en) Apparatus and method for generating 3d map
CN112906777A (en) Target detection method and device, electronic equipment and storage medium
CN114919584A (en) Motor vehicle fixed point target distance measuring method and device and computer readable storage medium
CN117079238A (en) Road edge detection method, device, equipment and storage medium
CN114494466B (en) External parameter calibration method, device and equipment and storage medium
WO2020113425A1 (en) Systems and methods for constructing high-definition map
CN111488762A (en) Lane-level positioning method and device and positioning equipment
CN114004957A (en) Augmented reality picture generation method, device, equipment and storage medium
CN116917936A (en) External parameter calibration method and device for binocular camera
JP6593995B2 (en) Airport monitoring device
CN116736322B (en) Speed prediction method integrating camera image and airborne laser radar point cloud data
CN117541655B (en) Method for eliminating radar map building z-axis accumulated error by fusion of visual semantics
CN117649619B (en) Unmanned aerial vehicle visual navigation positioning recovery method, system, device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210810