CN113643359A - Target object positioning method, device, equipment and storage medium - Google Patents

Target object positioning method, device, equipment and storage medium Download PDF

Info

Publication number
CN113643359A
CN113643359A CN202110986953.XA CN202110986953A CN113643359A CN 113643359 A CN113643359 A CN 113643359A CN 202110986953 A CN202110986953 A CN 202110986953A CN 113643359 A CN113643359 A CN 113643359A
Authority
CN
China
Prior art keywords
target object
camera
value
calculating
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110986953.XA
Other languages
Chinese (zh)
Inventor
郑义
杨庆雄
韩旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Weride Technology Co Ltd
Original Assignee
Guangzhou Weride Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weride Technology Co Ltd filed Critical Guangzhou Weride Technology Co Ltd
Priority to CN202110986953.XA priority Critical patent/CN113643359A/en
Publication of CN113643359A publication Critical patent/CN113643359A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target object positioning method, a target object positioning device, target object positioning equipment and a storage medium. After detecting a target object from an environment image, calculating a first depth prediction value of the target object in a camera coordinate system, then correcting the first depth prediction value to obtain a second depth prediction value, and then calculating coordinates of the target object in the camera coordinate system based on a central point of the target object and the second depth prediction value. The position accuracy of the target object is improved by correcting the depth value of the target object in the camera coordinate system and calculating the coordinate of the target object in the camera coordinate system based on the corrected depth value.

Description

Target object positioning method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of automatic driving, in particular to a target object positioning method, device, equipment and storage medium.
Background
The automatic driving vehicle is a novel intelligent vehicle, senses the surrounding environment of the automatic driving vehicle through a carried sensor, and acquires environmental information. The environment information is accurately calculated and analyzed through a Control device (namely, a vehicle-mounted intelligent brain), and finally different devices in the automatic driving vehicle are respectively controlled by sending instructions to an Electronic Control Unit (ECU), so that the full-automatic running of the vehicle is realized, and the aim of unmanned driving is fulfilled.
The perception of an external target object (e.g., an obstacle) is one of the most important functions of automatic driving, so that an automatic driving vehicle can avoid the obstacle in time according to the perceived target object, and accidents are avoided. The most commonly used sensors for sensing target objects at present are lidar and cameras, the lidar has the disadvantage of high cost, and the position of the target object detected by the camera-based target object has large error.
Disclosure of Invention
The invention provides a target object positioning method, a target object positioning device, target object positioning equipment and a storage medium, which are used for improving the position accuracy of a target object.
In a first aspect, an embodiment of the present invention provides a target object positioning method, including:
calculating a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;
correcting the first depth predicted value to obtain a second depth predicted value;
calculating coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value.
In a second aspect, an embodiment of the present invention further provides a target object positioning apparatus, including:
a first depth prediction value calculation module, configured to calculate a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;
the correction module is used for correcting the first depth predicted value to obtain a second depth predicted value;
a coordinate determination module to calculate coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value.
In a third aspect, an embodiment of the present invention further provides a computer device, including:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a target object localization method as provided by the first aspect of the invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the target object positioning method according to the first aspect of the present invention.
According to the target object positioning method provided by the embodiment of the invention, after the target object is detected from the environment image, the first depth predicted value of the target object in the camera coordinate system is calculated, then, the first depth predicted value is corrected to obtain the second depth predicted value, and then, the coordinate of the target object in the camera coordinate system is calculated based on the central point of the target object and the second depth predicted value. The position accuracy of the target object is improved by correcting the depth value of the target object in the camera coordinate system and calculating the coordinate of the target object in the camera coordinate system based on the corrected depth value.
Drawings
Fig. 1 is a flowchart of a target object positioning method according to an embodiment of the present invention;
fig. 2A is a flowchart of a target object positioning method according to a second embodiment of the present invention;
FIG. 2B is an imaging schematic of the camera;
fig. 3 is a schematic structural diagram of a target object positioning apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a target object positioning method according to an embodiment of the present invention, where the present embodiment is applicable to a case where camera coordinates of a target object are calculated by using an environment image acquired by a camera, and the method may be executed by a target object positioning apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and is generally configured in a computer device, as shown in fig. 1, the method specifically includes the following steps:
s101, calculating a first depth predicted value of a target object in a camera coordinate system based on the target object detected from the environment image.
In the embodiment of the invention, the environment image is an image which is acquired by the unmanned equipment in the process of traveling and is within the visual range of the camera. Illustratively, in the embodiment of the present invention, the camera is a monocular camera. For example, in the embodiment of the present invention, a common target detection method, such as Fast-RCNN or yolo (young Only Look once) series model, may be adopted to perform target detection, and a target object may be detected from an environmental image. The target object may be an obstacle during the course of the unmanned device, for example, for an unmanned vehicle, the target object may be another vehicle, a pedestrian, or another object temporarily appearing on the road surface, and the embodiment of the present invention is not limited herein.
After the target object is detected, namely the pixel coordinates of the target object in the environment image are determined, and a first depth predicted value of the target object in a camera coordinate system can be calculated based on the imaging principle of a camera and by combining camera parameters. Specifically, the road surface may be assumed to be a horizontal plane, the maximum point of the target object on the Y axis in the pixel coordinate system is the grounding point, and the camera parameter and the height of the camera are combined to calculate the maximum point. A first depth prediction value of the target object in the camera coordinate system, wherein the depth of the target object in the camera coordinate system is a horizontal distance of the target object from an origin of the camera coordinate system.
And S102, correcting the first depth predicted value to obtain a second depth predicted value.
In the calculation of the first depth prediction value, the first depth prediction value is calculated based on the grounding point of the target object. However, the grounding point is not the center point of the target object in the Y axis of the pixel coordinate system, which causes a certain error between the calculated first depth prediction value and the actual depth value. In order to eliminate the error, the first depth prediction value needs to be corrected to obtain a second depth prediction value.
And S103, calculating the coordinates of the target object in the camera coordinate system based on the central point of the target object and the second depth predicted value.
The central point of the target object may be a central point of the target object in the pixel coordinate system, that is, a 2-dimensional projection of the central point of the 3-dimensional target object in the environment image. Based on the imaging principle of the camera, the X-axis coordinate and the Y-axis coordinate of the target object in the camera coordinate system can be calculated by combining the central point of the target object, the second depth prediction value and the camera parameter, so that the 3-dimensional coordinate of the target object in the camera coordinate system is obtained.
According to the target object positioning method provided by the embodiment of the invention, after the target object is detected from the environment image, the first depth predicted value of the target object in the camera coordinate system is calculated, then, the first depth predicted value is corrected to obtain the second depth predicted value, and then, the coordinate of the target object in the camera coordinate system is calculated based on the central point of the target object and the second depth predicted value. The position accuracy of the target object is improved by correcting the depth value of the target object in the camera coordinate system and calculating the coordinate of the target object in the camera coordinate system based on the corrected depth value.
Example two
Fig. 2A is a flowchart of a target object positioning method according to a second embodiment of the present invention, which is refined based on the first embodiment, and details a specific implementation process of each step in the target object positioning method, as shown in fig. 2A, the method includes:
s201, acquiring an environment image acquired by a camera of the unmanned equipment.
In the embodiment of the invention, the unmanned equipment can be an unmanned vehicle, the unmanned vehicle is provided with a camera, and the environment image is an image which is acquired by the unmanned equipment in the traveling process and is within the visible range of the camera. Illustratively, in the embodiment of the present invention, the camera is a monocular camera.
S202, a target object is detected from the environment image, and the target object is shown in the environment image by a rectangular frame.
For example, in the embodiment of the present invention, a common target detection method, such as Fast-RCNN or yolo (young Only Look once) series model, may be adopted to perform target detection, and a target object may be detected from an environmental image. The target object may be an obstacle during the course of the unmanned device, for example, for an unmanned vehicle, the target object may be another vehicle, a pedestrian, or another object temporarily appearing on the road surface, and the embodiment of the present invention is not limited herein. The detected target object is usually selected and shown by a rectangular frame, and the pixel coordinate of the rectangular frame in the environment image is available (x)min,ymin,xmax,ymax) Is shown, wherein xminAnd xmaxIs the left and right boundaries of the rectangular frame, yminAnd ymaxThe upper and lower boundaries of the rectangular frame.
S203, calculating a first depth predicted value of the target object in a camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters.
Fig. 2B is an imaging schematic diagram of the camera, and fig. 2B is referenced. After obtaining the pixel coordinates of the rectangular frame, calculating a first depth prediction value of the target object in a camera coordinate system based on the imaging principle of the camera, the pixel coordinates of the rectangular frame and the camera parameters. The camera parameters are internal parameters of the camera, and the internal parameters of the camera can be represented by the following matrix:
Figure BDA0003231046840000061
wherein f isxIs the focal length in the X direction, f, of the camera coordinate systemyIs the focal length in the Y direction on the camera coordinate system, cxThe offset of the optical axis of the camera with respect to the coordinate center of the projection plane on the X-axis, cyIs the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis
Specifically, the maximum value (i.e., Y) of the Y-axis coordinate in the pixel coordinates of the rectangular frame is setmax) Subtracting the offset c of the optical axis of the camera with respect to the coordinate center of the projection plane on the Y axisyAnd obtaining a first numerical value. Calculating the focal length f of the camera in the Y-axisyThe quotient of the first value and the second value is obtained. And calculating the product of the second numerical value and the height h of the camera from the ground to obtain a third numerical value. And calculating the product of the second value and the tangent value of the inclination angle alpha of the camera relative to the ground to obtain a fourth value. And calculating the quotient of the third numerical value and the fourth numerical value to obtain a fifth numerical value. Calculating the sum of the third numerical value and the fifth numerical value to obtain a first depth predicted value D of the target object in the camera coordinate systemref
The mathematical expression of the above calculation process is:
Figure BDA0003231046840000071
t=fy/(ymax-cy)
s204, training a correction parameter output model in advance based on the first depth predicted value of the target object in the camera coordinate system and the known actual depth value.
The lower boundary y of the rectangular frame predicted by the target detection algorithm in the previous stepmaxThe first depth prediction value D calculated in the above step is not always accurate, and the target object has a certain height, so that the center point of the target object is not the grounding point of the target objectrefAnd (6) correcting.
For example, in the embodiment of the invention, a correction parameter is calculated by using the deep learning model, and then the first depth predicted value D is calculated by using the correction parameterrefAnd (6) correcting.
Specifically, a correction parameter output model is established based on the initial model parameters, and a large number of data samples are collected in advance, wherein the data samples comprise a first depth predicted value of the target object and a known actual depth value of the target object. And (3) performing iterative training on the correction parameter output model by adopting a large number of data samples until the model converges (namely, the error between the first depth predicted value and the actual depth value meets the requirement).
S205, inputting the first depth predicted value of the target object in the camera coordinate system into a trained training correction parameter output model for processing to obtain a correction parameter.
In practical application, the first depth predicted value D of the target object in the camera coordinate system obtained in the previous step is usedrefInputting the trained training correction parameter output model for processing to obtain a correction parameter delta. The correction parameters are defined as follows:
δ=Dgt/Dref
wherein D isgtIs the actual depth value of the target object.
S206, correcting the first depth predicted value by adopting the correction parameters to obtain a second depth predicted value.
Illustratively, in embodiments of the invention, the first depth is predicted as a value DrefMultiplying the corrected delta to obtain a second depth predicted value Dpre. The specific calculation formula is as follows:
Dpre=δ*Dref
and S207, detecting the central point of the target object from the environment image.
In the embodiment of the present invention, the central point of the target object may be directly predicted by using a deep neural network, or the central point of the rectangular frame obtained in the foregoing step may be subjected to position correction to obtain the central point of the target object, which is not limited herein.
Illustratively, in some embodiments of the invention, the image of the environment is processed using a centret target detection model to obtain a center point of the target object.
The traditional YOLO model and Fast-RCNN rely on a large number of anchors to detect targets, and the number of anchors is huge, and the sizes of the anchors are artificially and autonomously designed, so that the detection speed and accuracy are not high. The CenterNet is an anchor-free target detection model, can directly predict the central point of a target object, and improves the target detection efficiency and the position accuracy of the central point.
Illustratively, in another embodiment of the present invention, a deep learning model is used to calculate the deviation between the center point and the actual center point of the rectangular frame, and then the deviation is used to correct the position of the center point of the rectangular frame.
Specifically, a deviation output model is established based on initial model parameters, and a large number of data samples are collected in advance, wherein the data samples comprise the center point coordinates of a rectangular frame of the target object and the known actual center point coordinates of the target object. And (3) carrying out iterative training on the deviation output model by using a large number of data samples until the model converges (namely, the error between the coordinate of the central point of the rectangular frame of the target object and the coordinate of the actual central point meets the requirement).
In practical application, the pixel coordinates ((x) of the center point of the rectangular frame obtained in the target detection step are calculatedmax+xmin)/2,(ymax+ymin) And/2) inputting the deviation into a trained deviation output model for processing to obtain the deviation (x) between the central point of the rectangular frame and the actual central point of the target objectoffset,yoffset)。
And correcting the central point of the rectangular frame based on the deviation to obtain the central point of the target object. Specifically, the calculation formula is as follows:
Figure BDA0003231046840000091
Figure BDA0003231046840000092
wherein x ispThe X-axis coordinate, y, of the center point of the target object in the pixel coordinate system after correctionpFor repairingThe Y-axis coordinate of the center point of the target object right behind in the pixel coordinate system.
And S208, calculating the X-axis coordinate of the central point of the target object in the camera coordinate system based on the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.
After the pixel coordinate of the central point of the target object and the second depth prediction value are obtained, based on the imaging principle of the camera, the X-axis coordinate of the central point of the target object in a camera coordinate system is calculated according to the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameters and the second depth prediction value.
Specifically, the difference between the X-axis coordinate of the pixel coordinate of the center point of the target object and the offset of the optical axis of the camera with respect to the coordinate center of the projection plane on the X-axis is calculated to obtain a sixth numerical value. And calculating the product of the sixth numerical value and the focal length of the camera on the X axis to obtain a seventh numerical value. And calculating the quotient of the seventh numerical value and the second depth predicted value to obtain the X-axis coordinate of the central point of the target object in the camera coordinate system.
The mathematical expression of the above calculation process is:
xd=fx*(xp-cx)/Dpre
wherein x isdIs the X-axis coordinate of the center point of the target object in the camera coordinate system.
S209, calculating the Y-axis coordinate of the central point of the target object in the camera coordinate system based on the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.
Similarly, after the pixel coordinate of the central point of the target object and the second depth prediction value are obtained, based on the imaging principle of the camera, the Y-axis coordinate of the central point of the target object in the camera coordinate system is calculated according to the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.
Specifically, the difference between the Y-axis coordinate of the pixel coordinate of the center point of the target object and the offset of the optical axis of the camera with respect to the coordinate center of the projection plane on the Y-axis is calculated to obtain a ninth value. And calculating the product of the ninth value and the focal length of the camera on the Y axis to obtain a tenth value. And calculating the quotient of the tenth numerical value and the second depth predicted value to obtain the Y-axis coordinate of the central point of the target object in the camera coordinate system.
The mathematical expression of the above calculation process is:
yd=fy*(yp-cy)/Dpre
wherein, ydIs the Y-axis coordinate of the center point of the target object in the camera coordinate system.
Up to this point, through the above steps, the coordinates of the final target object in the camera coordinate system are (x)d,yd,Dpre)。
According to the target object positioning method provided by the embodiment of the invention, after a target object is detected from an environment image, a first depth prediction value of the target object in a camera coordinate system is calculated based on pixel coordinates of a rectangular frame and camera parameters, then, a correction parameter is calculated by adopting a depth learning model, the first depth prediction value is corrected by adopting the correction parameter to obtain a second depth prediction value, and then, coordinates of the target object in the camera coordinate system are calculated based on a central point of the target object and the second depth prediction value. The depth value of the target object in the camera coordinate system is corrected through the depth learning model, and the coordinate of the target object in the camera coordinate system is calculated based on the corrected depth value, so that the position accuracy of the target object is improved.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a target object positioning apparatus according to a third embodiment of the present invention, as shown in fig. 3, the apparatus includes:
a first depth prediction value calculation module 301, configured to calculate a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;
a correcting module 302, configured to correct the first depth prediction value to obtain a second depth prediction value;
a coordinate determination module 303, configured to calculate coordinates of the target object in the camera coordinate system based on the central point of the target object and the second depth prediction value.
In some embodiments of the present invention, the first depth prediction value calculation module 301 comprises:
the image acquisition sub-module is used for acquiring an environment image acquired by a camera of the unmanned equipment;
a target object detection sub-module for detecting a target object from the environment image, the target object being shown in the environment image with a rectangular frame;
and the first depth prediction value calculation sub-module is used for calculating a first depth prediction value of the target object in a camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters.
In some embodiments of the invention, the first depth predictor calculation sub-module comprises:
the first numerical value calculation unit is used for subtracting the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis from the maximum value of the Y axis coordinate in the pixel coordinates of the rectangular frame to obtain a first numerical value;
the second numerical value calculating unit is used for calculating the quotient of the focal length of the camera on the Y axis and the first numerical value to obtain a second numerical value;
the third numerical value calculation unit is used for calculating the product of the second numerical value and the height of the camera from the ground to obtain a third numerical value;
a fourth numerical value calculation unit, configured to calculate a product of the second numerical value and a tangent value of a tilt angle of the camera with respect to the ground, so as to obtain a fourth numerical value;
a fifth numerical value calculating unit, configured to calculate a quotient of the third numerical value and the fourth numerical value to obtain a fifth numerical value;
and the first depth prediction value calculation unit is used for calculating the sum of the third numerical value and the fifth numerical value to obtain a first depth prediction value of the target object in a camera coordinate system.
In some embodiments of the present invention, rework module 302 includes:
the correction parameter output model training submodule is used for training a correction parameter output model based on a first depth predicted value and a known actual depth value of the target object in a camera coordinate system in advance;
the correction parameter calculation submodule is used for inputting the first depth prediction value of the target object in the camera coordinate system into a trained training correction parameter output model for processing to obtain a correction parameter;
and the correction submodule is used for correcting the first depth predicted value by adopting the correction parameters to obtain a second depth predicted value.
In some embodiments of the invention, the target object locating device further comprises:
a central point determination module, configured to detect a central point of the target object from the environmental image before calculating coordinates of the target object in the camera coordinate system based on the central point of the target object and the second depth prediction value.
In some embodiments of the invention, the center point determination module comprises:
and the central point detection submodule is used for processing the environment image by adopting a CenterNet target detection model to obtain the central point of the target object.
In some embodiments of the invention, the center point determination module comprises:
the deviation output model submodule is used for training a deviation output model based on the central point of the rectangular frame and the central point of a known target object in advance;
the deviation calculation submodule is used for inputting the pixel coordinates of the central point of the rectangular frame into a trained deviation output model for processing to obtain the deviation between the central point of the rectangular frame and the central point of the target object;
and the central point correction submodule is used for correcting the central point of the rectangular frame based on the deviation to obtain the central point of the target object.
In some embodiments of the present invention, the coordinate determination module 303 comprises:
the X-axis coordinate calculation sub-module is used for calculating the X-axis coordinate of the central point of the target object in a camera coordinate system based on the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value;
and the Y-axis coordinate calculation sub-module is used for calculating the Y-axis coordinate of the central point of the target object in a camera coordinate system based on the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.
In some embodiments of the invention, the X-axis coordinate calculation sub-module comprises:
a sixth numerical value calculation unit, configured to calculate a difference between an X-axis coordinate of a pixel coordinate of the center point of the target object and an offset of the optical axis of the camera with respect to a coordinate center of the projection plane on the X axis, so as to obtain a sixth numerical value;
a seventh numerical value calculating unit, configured to calculate a product of the sixth numerical value and the focal length of the camera on the X axis to obtain a seventh numerical value;
and the X-axis coordinate calculation unit is used for calculating the quotient of the seventh numerical value and the second depth predicted value to obtain the X-axis coordinate of the central point of the target object in a camera coordinate system.
In some embodiments of the invention, the Y-axis coordinate calculation submodule comprises:
a ninth numerical value calculating unit, configured to calculate a difference between a Y-axis coordinate of a pixel coordinate of the center point of the target object and an offset of the optical axis of the camera with respect to a coordinate center of a projection plane on the Y-axis, so as to obtain a ninth numerical value;
a tenth numerical value calculating unit, configured to calculate a product of the ninth numerical value and a focal length of the camera in the Y axis to obtain a tenth numerical value;
and the Y-axis coordinate calculation unit is used for calculating the quotient of the tenth numerical value and the second depth prediction value to obtain the Y-axis coordinate of the central point of the target object in a camera coordinate system.
The target object positioning device can execute the target object positioning method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the target object positioning method.
Example four
A fourth embodiment of the present invention provides a computer device, and fig. 4 is a schematic structural diagram of the computer device provided in the fourth embodiment of the present invention, as shown in fig. 4, the computer device includes:
a processor 401, a memory 402, a communication module 403, an input device 404, and an output device 405; the number of the processors 401 in the mobile terminal may be one or more, and one processor 401 is taken as an example in fig. 4; the processor 401, the memory 402, the communication module 403, the input device 404 and the output device 405 in the mobile terminal may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus. The processor 401, memory 402, communication module 403, input device 404, and output device 405 described above may be integrated on a computer device.
The memory 402 is a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as the modules corresponding to the target object positioning method in the above embodiments. The processor 401 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 402, so as to realize the target object positioning method.
The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the microcomputer, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 402 may further include memory located remotely from the processor 401, which may be connected to an electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
And a communication module 403, configured to establish a connection with an external device (e.g., an intelligent terminal), and implement data interaction with the external device. The input device 404 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the computer apparatus.
The computer device provided by this embodiment may execute the target object positioning method provided by any of the above embodiments of the present invention, and has corresponding functions and beneficial effects.
EXAMPLE five
An embodiment of the present invention provides a storage medium containing computer-executable instructions, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for positioning a target object according to any of the foregoing embodiments of the present invention is implemented, where the method includes:
calculating a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;
correcting the first depth predicted value to obtain a second depth predicted value;
calculating coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value.
It should be noted that, as for the apparatus, the device and the storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and in relevant places, reference may be made to the partial description of the method embodiments.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the target object locating method according to any embodiment of the present invention.
It should be noted that, in the above apparatus, each module, sub-module, and unit included in the apparatus is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, the specific names of the functional modules are only for convenience of distinguishing from each other and are not used for limiting the protection scope of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (13)

1. A method for locating a target object, comprising:
calculating a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;
correcting the first depth predicted value to obtain a second depth predicted value;
calculating coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value.
2. The method according to claim 1, wherein calculating a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environmental image comprises:
acquiring an environment image acquired by a camera of the unmanned equipment;
detecting a target object from the environment image, wherein the target object is shown in the environment image by a rectangular frame;
calculating a first depth prediction value of the target object in a camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters.
3. The method according to claim 2, wherein calculating the first depth prediction value of the target object in the camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters comprises:
subtracting the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis from the maximum value of the Y axis coordinate in the pixel coordinates of the rectangular frame to obtain a first numerical value;
calculating the quotient of the focal length of the camera on the Y axis and the first numerical value to obtain a second numerical value;
calculating the product of the second value and the height of the camera from the ground to obtain a third value;
calculating the product of the second numerical value and the tangent value of the inclination angle of the camera relative to the ground to obtain a fourth numerical value;
calculating the quotient of the third numerical value and the fourth numerical value to obtain a fifth numerical value;
and calculating the sum of the third numerical value and the fifth numerical value to obtain a first depth predicted value of the target object in a camera coordinate system.
4. The method for locating a target object according to any one of claims 1 to 3, wherein the step of modifying the first depth prediction value to obtain a second depth prediction value comprises:
training a correction parameter output model based on a first depth predicted value and a known actual depth value of a target object in a camera coordinate system in advance;
inputting the first depth predicted value of the target object in the camera coordinate system into a trained training correction parameter output model for processing to obtain a correction parameter;
and correcting the first depth predicted value by adopting the correction parameters to obtain a second depth predicted value.
5. The method of any one of claims 1 to 3, further comprising, before calculating the coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value:
and detecting the central point of the target object from the environment image.
6. The target object locating method according to claim 5, wherein detecting the central point of the target object from the environment image comprises:
and processing the environment image by adopting a CenterNet target detection model to obtain a central point of a target object.
7. The target object locating method according to claim 5, wherein detecting the central point of the target object from the environment image comprises:
training a deviation output model based on the central point of the rectangular frame and the central point of a known target object in advance;
inputting the pixel coordinates of the central point of the rectangular frame into a trained deviation output model for processing to obtain the deviation between the central point of the rectangular frame and the central point of the target object;
and correcting the central point of the rectangular frame based on the deviation to obtain the central point of the target object.
8. The method according to any one of claims 1 to 3, wherein calculating the coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value comprises:
calculating the X-axis coordinate of the central point of the target object in a camera coordinate system based on the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value;
and calculating the Y-axis coordinate of the central point of the target object in a camera coordinate system based on the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.
9. The target object positioning method according to claim 8, wherein calculating an X-axis coordinate of the center point of the target object in a camera coordinate system based on the X-axis coordinate of the pixel coordinate of the center point of the target object, the camera parameter, and the second depth prediction value comprises:
calculating the difference between the X-axis coordinate of the pixel coordinate of the central point of the target object and the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the X axis to obtain a sixth numerical value;
calculating the product of the sixth numerical value and the focal length of the camera on the X axis to obtain a seventh numerical value;
and calculating the quotient of the seventh numerical value and the second depth predicted value to obtain the X-axis coordinate of the central point of the target object in a camera coordinate system.
10. The target object positioning method according to claim 8, wherein calculating Y-axis coordinates of the center point of the target object in a camera coordinate system based on the Y-axis coordinates of the pixel coordinates of the center point of the target object, the camera parameters, and the second depth prediction value includes:
calculating the difference between the Y-axis coordinate of the pixel coordinate of the central point of the target object and the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis to obtain a ninth value;
calculating the product of the ninth numerical value and the focal length of the camera on the Y axis to obtain a tenth numerical value;
and calculating the quotient of the tenth numerical value and the second depth predicted value to obtain the Y-axis coordinate of the central point of the target object in a camera coordinate system.
11. A target object positioning apparatus, comprising:
a first depth prediction value calculation module, configured to calculate a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;
the correction module is used for correcting the first depth predicted value to obtain a second depth predicted value;
a coordinate determination module to calculate coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value.
12. A computer device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a target object localization method as recited in any of claims 1-10.
13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for object localization as claimed in any one of claims 1 to 10.
CN202110986953.XA 2021-08-26 2021-08-26 Target object positioning method, device, equipment and storage medium Pending CN113643359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110986953.XA CN113643359A (en) 2021-08-26 2021-08-26 Target object positioning method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110986953.XA CN113643359A (en) 2021-08-26 2021-08-26 Target object positioning method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113643359A true CN113643359A (en) 2021-11-12

Family

ID=78423966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110986953.XA Pending CN113643359A (en) 2021-08-26 2021-08-26 Target object positioning method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113643359A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842426A (en) * 2022-07-06 2022-08-02 广东电网有限责任公司肇庆供电局 Transformer substation equipment state monitoring method and system based on accurate alignment camera shooting

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150145965A1 (en) * 2013-11-26 2015-05-28 Mobileye Vision Technologies Ltd. Stereo auto-calibration from structure-from-motion
US20190278983A1 (en) * 2018-03-12 2019-09-12 Nvidia Corporation Three-dimensional (3d) pose estimation from a monocular camera
CN111428859A (en) * 2020-03-05 2020-07-17 北京三快在线科技有限公司 Depth estimation network training method and device for automatic driving scene and autonomous vehicle
CN111680554A (en) * 2020-04-29 2020-09-18 北京三快在线科技有限公司 Depth estimation method and device for automatic driving scene and autonomous vehicle
CN112487979A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Target detection method, model training method, device, electronic device and medium
CN113177976A (en) * 2021-04-29 2021-07-27 深圳安智杰科技有限公司 Depth estimation method and device, electronic equipment and storage medium
CN113256698A (en) * 2021-06-09 2021-08-13 中国人民解放军国防科技大学 Monocular 3D reconstruction method with depth prediction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150145965A1 (en) * 2013-11-26 2015-05-28 Mobileye Vision Technologies Ltd. Stereo auto-calibration from structure-from-motion
US20190278983A1 (en) * 2018-03-12 2019-09-12 Nvidia Corporation Three-dimensional (3d) pose estimation from a monocular camera
CN111428859A (en) * 2020-03-05 2020-07-17 北京三快在线科技有限公司 Depth estimation network training method and device for automatic driving scene and autonomous vehicle
CN111680554A (en) * 2020-04-29 2020-09-18 北京三快在线科技有限公司 Depth estimation method and device for automatic driving scene and autonomous vehicle
CN112487979A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Target detection method, model training method, device, electronic device and medium
CN113177976A (en) * 2021-04-29 2021-07-27 深圳安智杰科技有限公司 Depth estimation method and device, electronic equipment and storage medium
CN113256698A (en) * 2021-06-09 2021-08-13 中国人民解放军国防科技大学 Monocular 3D reconstruction method with depth prediction

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842426A (en) * 2022-07-06 2022-08-02 广东电网有限责任公司肇庆供电局 Transformer substation equipment state monitoring method and system based on accurate alignment camera shooting
CN114842426B (en) * 2022-07-06 2022-10-04 广东电网有限责任公司肇庆供电局 Transformer substation equipment state monitoring method and system based on accurate alignment camera shooting

Similar Documents

Publication Publication Date Title
US11422261B2 (en) Robot relocalization method and apparatus and robot using the same
CN108885791B (en) Ground detection method, related device and computer readable storage medium
CN109543493B (en) Lane line detection method and device and electronic equipment
US11045953B2 (en) Relocalization method and robot using the same
CN108197590B (en) Pavement detection method, device, terminal and storage medium
KR101995223B1 (en) System, module and method for detecting pedestrian, computer program
US9802539B2 (en) Distance and direction estimation of a target point from a vehicle using monocular video camera
CN113052907B (en) Positioning method of mobile robot in dynamic environment
CN112862890B (en) Road gradient prediction method, device and storage medium
CN114047487B (en) Radar and vehicle body external parameter calibration method and device, electronic equipment and storage medium
CN108319931B (en) Image processing method and device and terminal
CN108376384B (en) Method and device for correcting disparity map and storage medium
CN111046809B (en) Obstacle detection method, device, equipment and computer readable storage medium
CN114550042A (en) Road vanishing point extraction method, vehicle-mounted sensor calibration method and device
CN115546313A (en) Vehicle-mounted camera self-calibration method and device, electronic equipment and storage medium
CN114972427A (en) Target tracking method based on monocular vision, terminal equipment and storage medium
CN113643359A (en) Target object positioning method, device, equipment and storage medium
CN114919584A (en) Motor vehicle fixed point target distance measuring method and device and computer readable storage medium
CN116778458B (en) Parking space detection model construction method, parking space detection method, equipment and storage medium
CN110880003B (en) Image matching method and device, storage medium and automobile
CN112304322B (en) Restarting method after visual positioning failure and vehicle-mounted terminal
CN114037977B (en) Road vanishing point detection method, device, equipment and storage medium
Wang et al. Road edge detection based on improved RANSAC and 2D LIDAR Data
KR20210103865A (en) Vanishing point extraction device and method of extracting vanishing point
EP3229173B1 (en) Method and apparatus for determining a traversable path

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination