CN113643359A

CN113643359A - Target object positioning method, device, equipment and storage medium

Info

Publication number: CN113643359A
Application number: CN202110986953.XA
Authority: CN
Inventors: 郑义; 杨庆雄; 韩旭
Original assignee: Guangzhou Weride Technology Co Ltd
Current assignee: Guangzhou Weride Technology Co Ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2021-11-12

Abstract

The invention discloses a target object positioning method, a target object positioning device, target object positioning equipment and a storage medium. After detecting a target object from an environment image, calculating a first depth prediction value of the target object in a camera coordinate system, then correcting the first depth prediction value to obtain a second depth prediction value, and then calculating coordinates of the target object in the camera coordinate system based on a central point of the target object and the second depth prediction value. The position accuracy of the target object is improved by correcting the depth value of the target object in the camera coordinate system and calculating the coordinate of the target object in the camera coordinate system based on the corrected depth value.

Description

Target object positioning method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of automatic driving, in particular to a target object positioning method, device, equipment and storage medium.

Background

The automatic driving vehicle is a novel intelligent vehicle, senses the surrounding environment of the automatic driving vehicle through a carried sensor, and acquires environmental information. The environment information is accurately calculated and analyzed through a Control device (namely, a vehicle-mounted intelligent brain), and finally different devices in the automatic driving vehicle are respectively controlled by sending instructions to an Electronic Control Unit (ECU), so that the full-automatic running of the vehicle is realized, and the aim of unmanned driving is fulfilled.

The perception of an external target object (e.g., an obstacle) is one of the most important functions of automatic driving, so that an automatic driving vehicle can avoid the obstacle in time according to the perceived target object, and accidents are avoided. The most commonly used sensors for sensing target objects at present are lidar and cameras, the lidar has the disadvantage of high cost, and the position of the target object detected by the camera-based target object has large error.

Disclosure of Invention

The invention provides a target object positioning method, a target object positioning device, target object positioning equipment and a storage medium, which are used for improving the position accuracy of a target object.

In a first aspect, an embodiment of the present invention provides a target object positioning method, including:

calculating a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;

correcting the first depth predicted value to obtain a second depth predicted value;

calculating coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value.

In a second aspect, an embodiment of the present invention further provides a target object positioning apparatus, including:

a first depth prediction value calculation module, configured to calculate a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;

the correction module is used for correcting the first depth predicted value to obtain a second depth predicted value;

a coordinate determination module to calculate coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value.

In a third aspect, an embodiment of the present invention further provides a computer device, including:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a target object localization method as provided by the first aspect of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the target object positioning method according to the first aspect of the present invention.

According to the target object positioning method provided by the embodiment of the invention, after the target object is detected from the environment image, the first depth predicted value of the target object in the camera coordinate system is calculated, then, the first depth predicted value is corrected to obtain the second depth predicted value, and then, the coordinate of the target object in the camera coordinate system is calculated based on the central point of the target object and the second depth predicted value. The position accuracy of the target object is improved by correcting the depth value of the target object in the camera coordinate system and calculating the coordinate of the target object in the camera coordinate system based on the corrected depth value.

Drawings

Fig. 1 is a flowchart of a target object positioning method according to an embodiment of the present invention;

fig. 2A is a flowchart of a target object positioning method according to a second embodiment of the present invention;

FIG. 2B is an imaging schematic of the camera;

fig. 3 is a schematic structural diagram of a target object positioning apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a target object positioning method according to an embodiment of the present invention, where the present embodiment is applicable to a case where camera coordinates of a target object are calculated by using an environment image acquired by a camera, and the method may be executed by a target object positioning apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and is generally configured in a computer device, as shown in fig. 1, the method specifically includes the following steps:

s101, calculating a first depth predicted value of a target object in a camera coordinate system based on the target object detected from the environment image.

In the embodiment of the invention, the environment image is an image which is acquired by the unmanned equipment in the process of traveling and is within the visual range of the camera. Illustratively, in the embodiment of the present invention, the camera is a monocular camera. For example, in the embodiment of the present invention, a common target detection method, such as Fast-RCNN or yolo (young Only Look once) series model, may be adopted to perform target detection, and a target object may be detected from an environmental image. The target object may be an obstacle during the course of the unmanned device, for example, for an unmanned vehicle, the target object may be another vehicle, a pedestrian, or another object temporarily appearing on the road surface, and the embodiment of the present invention is not limited herein.

After the target object is detected, namely the pixel coordinates of the target object in the environment image are determined, and a first depth predicted value of the target object in a camera coordinate system can be calculated based on the imaging principle of a camera and by combining camera parameters. Specifically, the road surface may be assumed to be a horizontal plane, the maximum point of the target object on the Y axis in the pixel coordinate system is the grounding point, and the camera parameter and the height of the camera are combined to calculate the maximum point. A first depth prediction value of the target object in the camera coordinate system, wherein the depth of the target object in the camera coordinate system is a horizontal distance of the target object from an origin of the camera coordinate system.

And S102, correcting the first depth predicted value to obtain a second depth predicted value.

In the calculation of the first depth prediction value, the first depth prediction value is calculated based on the grounding point of the target object. However, the grounding point is not the center point of the target object in the Y axis of the pixel coordinate system, which causes a certain error between the calculated first depth prediction value and the actual depth value. In order to eliminate the error, the first depth prediction value needs to be corrected to obtain a second depth prediction value.

And S103, calculating the coordinates of the target object in the camera coordinate system based on the central point of the target object and the second depth predicted value.

The central point of the target object may be a central point of the target object in the pixel coordinate system, that is, a 2-dimensional projection of the central point of the 3-dimensional target object in the environment image. Based on the imaging principle of the camera, the X-axis coordinate and the Y-axis coordinate of the target object in the camera coordinate system can be calculated by combining the central point of the target object, the second depth prediction value and the camera parameter, so that the 3-dimensional coordinate of the target object in the camera coordinate system is obtained.

Example two

Fig. 2A is a flowchart of a target object positioning method according to a second embodiment of the present invention, which is refined based on the first embodiment, and details a specific implementation process of each step in the target object positioning method, as shown in fig. 2A, the method includes:

s201, acquiring an environment image acquired by a camera of the unmanned equipment.

In the embodiment of the invention, the unmanned equipment can be an unmanned vehicle, the unmanned vehicle is provided with a camera, and the environment image is an image which is acquired by the unmanned equipment in the traveling process and is within the visible range of the camera. Illustratively, in the embodiment of the present invention, the camera is a monocular camera.

S202, a target object is detected from the environment image, and the target object is shown in the environment image by a rectangular frame.

For example, in the embodiment of the present invention, a common target detection method, such as Fast-RCNN or yolo (young Only Look once) series model, may be adopted to perform target detection, and a target object may be detected from an environmental image. The target object may be an obstacle during the course of the unmanned device, for example, for an unmanned vehicle, the target object may be another vehicle, a pedestrian, or another object temporarily appearing on the road surface, and the embodiment of the present invention is not limited herein. The detected target object is usually selected and shown by a rectangular frame, and the pixel coordinate of the rectangular frame in the environment image is available (x)_min，y_min，x_max，y_max) Is shown, wherein x_minAnd x_maxIs the left and right boundaries of the rectangular frame, y_minAnd y_maxThe upper and lower boundaries of the rectangular frame.

S203, calculating a first depth predicted value of the target object in a camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters.

Fig. 2B is an imaging schematic diagram of the camera, and fig. 2B is referenced. After obtaining the pixel coordinates of the rectangular frame, calculating a first depth prediction value of the target object in a camera coordinate system based on the imaging principle of the camera, the pixel coordinates of the rectangular frame and the camera parameters. The camera parameters are internal parameters of the camera, and the internal parameters of the camera can be represented by the following matrix:

wherein f is_xIs the focal length in the X direction, f, of the camera coordinate system_yIs the focal length in the Y direction on the camera coordinate system, c_xThe offset of the optical axis of the camera with respect to the coordinate center of the projection plane on the X-axis, c_yIs the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis

Specifically, the maximum value (i.e., Y) of the Y-axis coordinate in the pixel coordinates of the rectangular frame is set_max) Subtracting the offset c of the optical axis of the camera with respect to the coordinate center of the projection plane on the Y axis_yAnd obtaining a first numerical value. Calculating the focal length f of the camera in the Y-axis_yThe quotient of the first value and the second value is obtained. And calculating the product of the second numerical value and the height h of the camera from the ground to obtain a third numerical value. And calculating the product of the second value and the tangent value of the inclination angle alpha of the camera relative to the ground to obtain a fourth value. And calculating the quotient of the third numerical value and the fourth numerical value to obtain a fifth numerical value. Calculating the sum of the third numerical value and the fifth numerical value to obtain a first depth predicted value D of the target object in the camera coordinate system_ref。

The mathematical expression of the above calculation process is:

t＝f_y/(y_max-c_y)

s204, training a correction parameter output model in advance based on the first depth predicted value of the target object in the camera coordinate system and the known actual depth value.

The lower boundary y of the rectangular frame predicted by the target detection algorithm in the previous step_maxThe first depth prediction value D calculated in the above step is not always accurate, and the target object has a certain height, so that the center point of the target object is not the grounding point of the target object_refAnd (6) correcting.

For example, in the embodiment of the invention, a correction parameter is calculated by using the deep learning model, and then the first depth predicted value D is calculated by using the correction parameter_refAnd (6) correcting.

Specifically, a correction parameter output model is established based on the initial model parameters, and a large number of data samples are collected in advance, wherein the data samples comprise a first depth predicted value of the target object and a known actual depth value of the target object. And (3) performing iterative training on the correction parameter output model by adopting a large number of data samples until the model converges (namely, the error between the first depth predicted value and the actual depth value meets the requirement).

S205, inputting the first depth predicted value of the target object in the camera coordinate system into a trained training correction parameter output model for processing to obtain a correction parameter.

In practical application, the first depth predicted value D of the target object in the camera coordinate system obtained in the previous step is used_refInputting the trained training correction parameter output model for processing to obtain a correction parameter delta. The correction parameters are defined as follows:

δ＝D_gt/D_ref

wherein D is_gtIs the actual depth value of the target object.

S206, correcting the first depth predicted value by adopting the correction parameters to obtain a second depth predicted value.

Illustratively, in embodiments of the invention, the first depth is predicted as a value D_refMultiplying the corrected delta to obtain a second depth predicted value D_pre. The specific calculation formula is as follows:

D_pre＝δ*D_ref

and S207, detecting the central point of the target object from the environment image.

In the embodiment of the present invention, the central point of the target object may be directly predicted by using a deep neural network, or the central point of the rectangular frame obtained in the foregoing step may be subjected to position correction to obtain the central point of the target object, which is not limited herein.

Illustratively, in some embodiments of the invention, the image of the environment is processed using a centret target detection model to obtain a center point of the target object.

The traditional YOLO model and Fast-RCNN rely on a large number of anchors to detect targets, and the number of anchors is huge, and the sizes of the anchors are artificially and autonomously designed, so that the detection speed and accuracy are not high. The CenterNet is an anchor-free target detection model, can directly predict the central point of a target object, and improves the target detection efficiency and the position accuracy of the central point.

Illustratively, in another embodiment of the present invention, a deep learning model is used to calculate the deviation between the center point and the actual center point of the rectangular frame, and then the deviation is used to correct the position of the center point of the rectangular frame.

Specifically, a deviation output model is established based on initial model parameters, and a large number of data samples are collected in advance, wherein the data samples comprise the center point coordinates of a rectangular frame of the target object and the known actual center point coordinates of the target object. And (3) carrying out iterative training on the deviation output model by using a large number of data samples until the model converges (namely, the error between the coordinate of the central point of the rectangular frame of the target object and the coordinate of the actual central point meets the requirement).

In practical application, the pixel coordinates ((x) of the center point of the rectangular frame obtained in the target detection step are calculated_max+x_min)/2，(y_max+y_min) And/2) inputting the deviation into a trained deviation output model for processing to obtain the deviation (x) between the central point of the rectangular frame and the actual central point of the target object_offset，y_offset)。

And correcting the central point of the rectangular frame based on the deviation to obtain the central point of the target object. Specifically, the calculation formula is as follows:

wherein x is_pThe X-axis coordinate, y, of the center point of the target object in the pixel coordinate system after correction_pFor repairingThe Y-axis coordinate of the center point of the target object right behind in the pixel coordinate system.

And S208, calculating the X-axis coordinate of the central point of the target object in the camera coordinate system based on the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.

After the pixel coordinate of the central point of the target object and the second depth prediction value are obtained, based on the imaging principle of the camera, the X-axis coordinate of the central point of the target object in a camera coordinate system is calculated according to the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameters and the second depth prediction value.

Specifically, the difference between the X-axis coordinate of the pixel coordinate of the center point of the target object and the offset of the optical axis of the camera with respect to the coordinate center of the projection plane on the X-axis is calculated to obtain a sixth numerical value. And calculating the product of the sixth numerical value and the focal length of the camera on the X axis to obtain a seventh numerical value. And calculating the quotient of the seventh numerical value and the second depth predicted value to obtain the X-axis coordinate of the central point of the target object in the camera coordinate system.

The mathematical expression of the above calculation process is:

x_d＝f_x*(x_p-c_x)/D_pre

wherein x is_dIs the X-axis coordinate of the center point of the target object in the camera coordinate system.

S209, calculating the Y-axis coordinate of the central point of the target object in the camera coordinate system based on the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.

Similarly, after the pixel coordinate of the central point of the target object and the second depth prediction value are obtained, based on the imaging principle of the camera, the Y-axis coordinate of the central point of the target object in the camera coordinate system is calculated according to the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.

Specifically, the difference between the Y-axis coordinate of the pixel coordinate of the center point of the target object and the offset of the optical axis of the camera with respect to the coordinate center of the projection plane on the Y-axis is calculated to obtain a ninth value. And calculating the product of the ninth value and the focal length of the camera on the Y axis to obtain a tenth value. And calculating the quotient of the tenth numerical value and the second depth predicted value to obtain the Y-axis coordinate of the central point of the target object in the camera coordinate system.

The mathematical expression of the above calculation process is:

y_d＝f_y*(y_p-c_y)/D_pre

wherein, y_dIs the Y-axis coordinate of the center point of the target object in the camera coordinate system.

Up to this point, through the above steps, the coordinates of the final target object in the camera coordinate system are (x)_d,y_d,D_pre)。

According to the target object positioning method provided by the embodiment of the invention, after a target object is detected from an environment image, a first depth prediction value of the target object in a camera coordinate system is calculated based on pixel coordinates of a rectangular frame and camera parameters, then, a correction parameter is calculated by adopting a depth learning model, the first depth prediction value is corrected by adopting the correction parameter to obtain a second depth prediction value, and then, coordinates of the target object in the camera coordinate system are calculated based on a central point of the target object and the second depth prediction value. The depth value of the target object in the camera coordinate system is corrected through the depth learning model, and the coordinate of the target object in the camera coordinate system is calculated based on the corrected depth value, so that the position accuracy of the target object is improved.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a target object positioning apparatus according to a third embodiment of the present invention, as shown in fig. 3, the apparatus includes:

a first depth prediction value calculation module 301, configured to calculate a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environment image;

a correcting module 302, configured to correct the first depth prediction value to obtain a second depth prediction value;

a coordinate determination module 303, configured to calculate coordinates of the target object in the camera coordinate system based on the central point of the target object and the second depth prediction value.

In some embodiments of the present invention, the first depth prediction value calculation module 301 comprises:

the image acquisition sub-module is used for acquiring an environment image acquired by a camera of the unmanned equipment;

a target object detection sub-module for detecting a target object from the environment image, the target object being shown in the environment image with a rectangular frame;

and the first depth prediction value calculation sub-module is used for calculating a first depth prediction value of the target object in a camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters.

In some embodiments of the invention, the first depth predictor calculation sub-module comprises:

the first numerical value calculation unit is used for subtracting the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis from the maximum value of the Y axis coordinate in the pixel coordinates of the rectangular frame to obtain a first numerical value;

the second numerical value calculating unit is used for calculating the quotient of the focal length of the camera on the Y axis and the first numerical value to obtain a second numerical value;

the third numerical value calculation unit is used for calculating the product of the second numerical value and the height of the camera from the ground to obtain a third numerical value;

a fourth numerical value calculation unit, configured to calculate a product of the second numerical value and a tangent value of a tilt angle of the camera with respect to the ground, so as to obtain a fourth numerical value;

a fifth numerical value calculating unit, configured to calculate a quotient of the third numerical value and the fourth numerical value to obtain a fifth numerical value;

and the first depth prediction value calculation unit is used for calculating the sum of the third numerical value and the fifth numerical value to obtain a first depth prediction value of the target object in a camera coordinate system.

In some embodiments of the present invention, rework module 302 includes:

the correction parameter output model training submodule is used for training a correction parameter output model based on a first depth predicted value and a known actual depth value of the target object in a camera coordinate system in advance;

the correction parameter calculation submodule is used for inputting the first depth prediction value of the target object in the camera coordinate system into a trained training correction parameter output model for processing to obtain a correction parameter;

and the correction submodule is used for correcting the first depth predicted value by adopting the correction parameters to obtain a second depth predicted value.

In some embodiments of the invention, the target object locating device further comprises:

a central point determination module, configured to detect a central point of the target object from the environmental image before calculating coordinates of the target object in the camera coordinate system based on the central point of the target object and the second depth prediction value.

In some embodiments of the invention, the center point determination module comprises:

and the central point detection submodule is used for processing the environment image by adopting a CenterNet target detection model to obtain the central point of the target object.

the deviation output model submodule is used for training a deviation output model based on the central point of the rectangular frame and the central point of a known target object in advance;

the deviation calculation submodule is used for inputting the pixel coordinates of the central point of the rectangular frame into a trained deviation output model for processing to obtain the deviation between the central point of the rectangular frame and the central point of the target object;

and the central point correction submodule is used for correcting the central point of the rectangular frame based on the deviation to obtain the central point of the target object.

In some embodiments of the present invention, the coordinate determination module 303 comprises:

the X-axis coordinate calculation sub-module is used for calculating the X-axis coordinate of the central point of the target object in a camera coordinate system based on the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value;

and the Y-axis coordinate calculation sub-module is used for calculating the Y-axis coordinate of the central point of the target object in a camera coordinate system based on the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.

In some embodiments of the invention, the X-axis coordinate calculation sub-module comprises:

a sixth numerical value calculation unit, configured to calculate a difference between an X-axis coordinate of a pixel coordinate of the center point of the target object and an offset of the optical axis of the camera with respect to a coordinate center of the projection plane on the X axis, so as to obtain a sixth numerical value;

a seventh numerical value calculating unit, configured to calculate a product of the sixth numerical value and the focal length of the camera on the X axis to obtain a seventh numerical value;

and the X-axis coordinate calculation unit is used for calculating the quotient of the seventh numerical value and the second depth predicted value to obtain the X-axis coordinate of the central point of the target object in a camera coordinate system.

In some embodiments of the invention, the Y-axis coordinate calculation submodule comprises:

a ninth numerical value calculating unit, configured to calculate a difference between a Y-axis coordinate of a pixel coordinate of the center point of the target object and an offset of the optical axis of the camera with respect to a coordinate center of a projection plane on the Y-axis, so as to obtain a ninth numerical value;

a tenth numerical value calculating unit, configured to calculate a product of the ninth numerical value and a focal length of the camera in the Y axis to obtain a tenth numerical value;

and the Y-axis coordinate calculation unit is used for calculating the quotient of the tenth numerical value and the second depth prediction value to obtain the Y-axis coordinate of the central point of the target object in a camera coordinate system.

The target object positioning device can execute the target object positioning method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the target object positioning method.

Example four

A fourth embodiment of the present invention provides a computer device, and fig. 4 is a schematic structural diagram of the computer device provided in the fourth embodiment of the present invention, as shown in fig. 4, the computer device includes:

a processor 401, a memory 402, a communication module 403, an input device 404, and an output device 405; the number of the processors 401 in the mobile terminal may be one or more, and one processor 401 is taken as an example in fig. 4; the processor 401, the memory 402, the communication module 403, the input device 404 and the output device 405 in the mobile terminal may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus. The processor 401, memory 402, communication module 403, input device 404, and output device 405 described above may be integrated on a computer device.

The memory 402 is a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as the modules corresponding to the target object positioning method in the above embodiments. The processor 401 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 402, so as to realize the target object positioning method.

The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the microcomputer, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 402 may further include memory located remotely from the processor 401, which may be connected to an electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

And a communication module 403, configured to establish a connection with an external device (e.g., an intelligent terminal), and implement data interaction with the external device. The input device 404 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the computer apparatus.

The computer device provided by this embodiment may execute the target object positioning method provided by any of the above embodiments of the present invention, and has corresponding functions and beneficial effects.

EXAMPLE five

An embodiment of the present invention provides a storage medium containing computer-executable instructions, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for positioning a target object according to any of the foregoing embodiments of the present invention is implemented, where the method includes:

It should be noted that, as for the apparatus, the device and the storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and in relevant places, reference may be made to the partial description of the method embodiments.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the target object locating method according to any embodiment of the present invention.

It should be noted that, in the above apparatus, each module, sub-module, and unit included in the apparatus is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, the specific names of the functional modules are only for convenience of distinguishing from each other and are not used for limiting the protection scope of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for locating a target object, comprising:

2. The method according to claim 1, wherein calculating a first depth prediction value of a target object in a camera coordinate system based on the target object detected from an environmental image comprises:

acquiring an environment image acquired by a camera of the unmanned equipment;

detecting a target object from the environment image, wherein the target object is shown in the environment image by a rectangular frame;

calculating a first depth prediction value of the target object in a camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters.

3. The method according to claim 2, wherein calculating the first depth prediction value of the target object in the camera coordinate system based on the pixel coordinates of the rectangular frame and the camera parameters comprises:

subtracting the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis from the maximum value of the Y axis coordinate in the pixel coordinates of the rectangular frame to obtain a first numerical value;

calculating the quotient of the focal length of the camera on the Y axis and the first numerical value to obtain a second numerical value;

calculating the product of the second value and the height of the camera from the ground to obtain a third value;

calculating the product of the second numerical value and the tangent value of the inclination angle of the camera relative to the ground to obtain a fourth numerical value;

calculating the quotient of the third numerical value and the fourth numerical value to obtain a fifth numerical value;

and calculating the sum of the third numerical value and the fifth numerical value to obtain a first depth predicted value of the target object in a camera coordinate system.

4. The method for locating a target object according to any one of claims 1 to 3, wherein the step of modifying the first depth prediction value to obtain a second depth prediction value comprises:

training a correction parameter output model based on a first depth predicted value and a known actual depth value of a target object in a camera coordinate system in advance;

inputting the first depth predicted value of the target object in the camera coordinate system into a trained training correction parameter output model for processing to obtain a correction parameter;

and correcting the first depth predicted value by adopting the correction parameters to obtain a second depth predicted value.

5. The method of any one of claims 1 to 3, further comprising, before calculating the coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value:

and detecting the central point of the target object from the environment image.

6. The target object locating method according to claim 5, wherein detecting the central point of the target object from the environment image comprises:

and processing the environment image by adopting a CenterNet target detection model to obtain a central point of a target object.

7. The target object locating method according to claim 5, wherein detecting the central point of the target object from the environment image comprises:

training a deviation output model based on the central point of the rectangular frame and the central point of a known target object in advance;

inputting the pixel coordinates of the central point of the rectangular frame into a trained deviation output model for processing to obtain the deviation between the central point of the rectangular frame and the central point of the target object;

and correcting the central point of the rectangular frame based on the deviation to obtain the central point of the target object.

8. The method according to any one of claims 1 to 3, wherein calculating the coordinates of the target object in the camera coordinate system based on the center point of the target object and the second depth prediction value comprises:

calculating the X-axis coordinate of the central point of the target object in a camera coordinate system based on the X-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value;

and calculating the Y-axis coordinate of the central point of the target object in a camera coordinate system based on the Y-axis coordinate of the pixel coordinate of the central point of the target object, the camera parameter and the second depth prediction value.

9. The target object positioning method according to claim 8, wherein calculating an X-axis coordinate of the center point of the target object in a camera coordinate system based on the X-axis coordinate of the pixel coordinate of the center point of the target object, the camera parameter, and the second depth prediction value comprises:

calculating the difference between the X-axis coordinate of the pixel coordinate of the central point of the target object and the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the X axis to obtain a sixth numerical value;

calculating the product of the sixth numerical value and the focal length of the camera on the X axis to obtain a seventh numerical value;

and calculating the quotient of the seventh numerical value and the second depth predicted value to obtain the X-axis coordinate of the central point of the target object in a camera coordinate system.

10. The target object positioning method according to claim 8, wherein calculating Y-axis coordinates of the center point of the target object in a camera coordinate system based on the Y-axis coordinates of the pixel coordinates of the center point of the target object, the camera parameters, and the second depth prediction value includes:

calculating the difference between the Y-axis coordinate of the pixel coordinate of the central point of the target object and the offset of the optical axis of the camera relative to the coordinate center of the projection plane on the Y axis to obtain a ninth value;

calculating the product of the ninth numerical value and the focal length of the camera on the Y axis to obtain a tenth numerical value;

and calculating the quotient of the tenth numerical value and the second depth predicted value to obtain the Y-axis coordinate of the central point of the target object in a camera coordinate system.

11. A target object positioning apparatus, comprising:

12. A computer device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a target object localization method as recited in any of claims 1-10.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for object localization as claimed in any one of claims 1 to 10.