CN110807431A

CN110807431A - Object positioning method and device, electronic equipment and storage medium

Info

Publication number: CN110807431A
Application number: CN201911077837.5A
Authority: CN
Inventors: 周康明; 俞云杰
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2020-02-18

Abstract

The application provides an object positioning method, an object positioning device, electronic equipment and a storage medium. The object positioning method is applied to terminal equipment, the terminal equipment comprises a camera, firstly, an image to be detected is obtained through the camera, then, target detection information is determined according to the image to be detected and a preset detection model, the determined target detection information comprises three-dimensional position information of the object to be detected in a preset camera coordinate system, and finally, relative position data between the object to be detected and the terminal equipment is determined according to the three-dimensional position information, so that the terminal equipment can position the object to be detected according to the relative position data. The object positioning method can realize positioning while detecting the object to be detected in a three-dimensional mode, and can provide important technical support for the application of the digital mapping technology to the technical fields of automation and intelligence by realizing positioning of the object to be detected, for example, provide key technical support for the fields of intelligent driving, augmented reality and the like.

Description

Object positioning method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to an object positioning method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development and wide application of computer vision technology, target detection becomes a more active research field. The detection of a target is generally divided into detection of a target object in an aerial image and a ground-shot image.

For the target detection of the ground-shot image, most of the current focuses on image retrieval and scene analysis, and the focus is on the extraction and representation of the structural features of the target object, while few technologies achieve three-dimensional detection and positioning of the target object of the ground-shot image.

However, detecting and positioning a target object of an earth-shot image is a technical problem to be solved in the process of applying a digital mapping technology to the field of automation and intelligence technology which is rapidly developed at present.

Disclosure of Invention

The application provides an object positioning method, an object positioning device, an electronic device and a storage medium, which are used for solving the technical problem that three-dimensional target detection and positioning cannot be realized in the prior art.

In a first aspect, the present application provides an object positioning method, which is applied to a terminal device, where the terminal device includes a camera, and the method includes:

acquiring an image to be detected, wherein the image to be detected comprises a target image, the target image is an image corresponding to an object to be detected, and the camera is used for acquiring the image to be detected;

determining target detection information according to the image to be detected and a preset detection model, wherein the target detection information comprises three-dimensional position information of the object to be detected in a preset camera coordinate system, and the preset camera coordinate system is a coordinate system corresponding to the camera;

and determining relative position data between the object to be detected and the terminal equipment according to the three-dimensional position information so that the terminal equipment can position the object to be detected according to the relative position data.

In a possible design, after the determining, according to the three-dimensional position information, relative position data between the object to be measured and the terminal device, the method further includes:

and determining a running route of the vehicle according to the relative position data so as to enable the vehicle to avoid the object to be detected, wherein the terminal equipment is the vehicle.

Optionally, after the determining the relative position data between the object to be measured and the terminal device according to the three-dimensional position information, the method further includes:

determining superposition position information of augmented reality content according to the relative position data, and displaying an augmented reality scene image according to the augmented reality content, the reality scene image and the superposition position information, wherein the reality scene image is an image shot by the camera, and the reality scene image comprises the target image.

In one possible design, the determining target detection information according to the image to be detected and a preset detection model includes:

determining two-dimensional information of the target image in a preset image coordinate system according to the image to be detected and a preset detection model, wherein the two-dimensional information is used for representing the relative position of the target image in the image to be detected;

and determining the three-dimensional position information according to the two-dimensional information and the mapping relation between a preset camera coordinate system and the preset image coordinate system.

In one possible design, before determining target detection information according to the image to be detected and a preset detection model, the method further includes:

the method comprises the steps of training a preset convolutional neural network by utilizing a labeled data training set to generate a preset detection model, wherein the labeled data training set comprises a plurality of training images and labeled marking frame information, the labeled marking frame information comprises labeled marking frame position information, labeled marking frame size information and labeled observation angle information, and the labeled marking frame information is used for marking the position of a target image in the training images.

In one possible design, the training a preset convolutional neural network with a labeled data training set to generate the preset detection model includes:

determining prediction marking frame information according to the preset convolutional neural network, wherein the prediction marking frame information is used for representing the position of the target image in the training image, and the prediction marking frame information comprises position information of a prediction marking frame, size information of the prediction marking frame and prediction observation angle information;

determining a marking frame matching value according to the prediction marking frame information and the marking frame information;

selecting the predicted observation angle information corresponding to the predicted marking frame information with the marking frame matching value larger than the preset matching value to carry out discretization processing so as to generate a plurality of discrete observation angle information;

determining label observation angle information according to the marked observation angle information and the discrete observation angle information;

determining a loss value between the predicted observation angle information and the label observation angle information according to the label observation angle information and a preset loss function;

and updating parameters of the preset convolutional neural network according to the loss value to generate the preset detection model.

In one possible design, the determining two-dimensional information of the target image in a preset image coordinate system according to the image to be detected and a preset detection model includes:

and determining the two-dimensional information according to the image to be detected and a preset detection model, wherein the two-dimensional information comprises coordinate information of a target marking frame, size information of the target marking frame and angle information of a target observation angle.

In a second aspect, the present application provides an object positioning apparatus, which is applied to a terminal device, where the terminal device includes a camera, the apparatus includes:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an image to be detected, the image to be detected comprises a target image, the target image is an image corresponding to an object to be detected, and the camera is used for acquiring the image to be detected;

the first determining module is used for determining target detection information according to the image to be detected and a preset detection model, wherein the target detection information comprises three-dimensional position information of the object to be detected in a preset camera coordinate system, and the preset camera coordinate system is a coordinate system corresponding to the camera;

and the second determining module is used for determining relative position data between the object to be detected and the terminal equipment according to the three-dimensional position information so that the terminal equipment can position the object to be detected according to the relative position data.

In one possible design, the object locating apparatus further includes:

and the third determining module is used for determining a running route of the vehicle according to the relative position data so that the vehicle avoids the object to be detected, and the terminal equipment is the vehicle.

Optionally, the object positioning apparatus further includes:

and the fourth determining module is used for determining the superposition position information of the augmented reality content according to the relative position data so as to display an augmented reality scene image according to the augmented reality content, the reality scene image and the superposition position information, wherein the reality scene image is an image shot by the camera, and the reality scene image comprises the target image.

In one possible design, the first determining module includes:

the first determining submodule is used for determining two-dimensional information of the target image in a preset image coordinate system according to the image to be detected and a preset detection model, and the two-dimensional information is used for representing the relative position of the target image in the image to be detected;

and the second determining submodule is used for determining the three-dimensional position information according to the two-dimensional information and the mapping relation between a preset camera coordinate system and the preset image coordinate system.

In one possible design, the object locating apparatus further includes:

the training module is used for training a preset convolutional neural network by utilizing a labeled data training set so as to generate the preset detection model, wherein the labeled data training set comprises a plurality of training images and labeled marking frame information, the labeled marking frame information comprises labeled marking frame position information, labeled marking frame size information and labeled observation angle information, and the labeled marking frame information is used for marking the position of a target image in the training images.

In one possible design, the training module is specifically configured to:

In one possible design, the first determining submodule is specifically configured to:

In a third aspect, the present application provides an electronic device, comprising:

a camera;

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of object localization in accordance with the first aspect and optional aspects.

In a fourth aspect, the present application provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the object localization method of the first aspect and optional aspects.

The application provides an object positioning method, an object positioning device, electronic equipment and a storage medium, which are applied to terminal equipment. Firstly, acquiring an image to be detected, wherein the image to be detected comprises a target image, the target image is an image corresponding to an object to be detected, then determining target detection information according to the image to be detected and a preset detection model, the target detection information comprises three-dimensional position information of the object to be detected in a preset camera coordinate system, a camera is used for acquiring the image to be detected, the preset camera coordinate system is a coordinate system corresponding to the camera, and finally, determining relative position data between the object to be detected and terminal equipment according to the three-dimensional position information of the object to be detected in the preset camera coordinate system, so that the terminal equipment can position the object to be detected according to the obtained relative position data. Therefore, the detection and the positioning of the object to be detected are realized, and important technical support is provided for the application of the digital mapping technology to the technical fields of automation and intelligence, for example, the key technical support is provided for the fields of intelligent driving, augmented reality and the like.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic flowchart of an object positioning method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating a relationship between a camera coordinate system and an image coordinate system according to an embodiment of the present disclosure;

fig. 3 is a schematic view of an application scenario of the object positioning method according to the embodiment of the present application;

fig. 4 is a schematic view of another application scenario of the object positioning method according to the embodiment of the present application;

fig. 5 is a schematic flowchart of determining target detection information according to an embodiment of the present disclosure;

fig. 6 is a schematic view of an observation angle of an object to be measured according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a process for generating a preset detection model according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an object positioning apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a schematic flowchart of an object positioning method according to an embodiment of the present disclosure, where the object positioning method may be applied to a terminal device, and the terminal device includes a camera, where the terminal device may be a vehicle, an Augmented Reality (AR) device, a mobile phone, a computer, a smart watch, a tablet computer, and the like, and the vehicle may be an intelligent driving vehicle, such as an unmanned driving vehicle; the AR device may be AR glasses, wearable AR equipment, or the like. The terminal device may be a wireless terminal or a wired terminal. All terminal devices with cameras are possible, and the embodiment of the application is not limited.

As shown in fig. 1, the object positioning method provided by this embodiment includes the following steps;

s11: and acquiring an image to be detected.

The image to be detected comprises a target image, the target image is an image corresponding to the object to be detected, and the camera is used for obtaining the image to be detected.

The object positioning method provided by this embodiment is applied to a terminal device, where the terminal device includes a camera, and the camera is used to obtain an image to be detected, where the image to be detected includes a target image, and the target image is an image corresponding to an object to be detected, where the object to be detected may be an object having a regular shape. Specifically, for example, a camera of the terminal device is used to take a picture including an object to be measured, and assuming that the object to be measured is a building, the taken picture is an image to be measured, and an image of the building in the picture is a target image.

Optionally, the image to be measured acquired in the embodiment of the present application may be a color image, a grayscale image, or a black-and-white image, which is not limited in the embodiment of the present application.

S12: and determining target detection information according to the image to be detected and a preset detection model.

The target detection information comprises three-dimensional position information of the object to be detected in a preset camera coordinate system, and the preset camera coordinate system is a coordinate system corresponding to the camera.

It should be understood that the predetermined camera coordinate system is a coordinate system corresponding to the camera. Because the concept of the camera coordinate system is a three-dimensional rectangular coordinate system established by taking the focusing center of the camera as the origin and the optical axis as the Z axis, the camera here is the camera on the terminal device in the embodiment of the present application. In addition, the camera coordinate system is also called an observation coordinate system, the origin of the camera coordinate system is the optical center of the camera, the X axis and the Y axis are respectively parallel to the X axis and the Y axis of the image, and the Z axis is the optical axis of the camera and is vertical to the graphic plane. The intersection point of the optical axis and the image plane is the origin of the image coordinate system, and the image coordinate system is a two-dimensional rectangular coordinate system.

Fig. 2 is a schematic diagram illustrating a relationship between a camera coordinate system and an image coordinate system according to an embodiment of the present application, as shown in fig. 2, a predetermined camera coordinate system is X_CY_CZ_CWherein the optical center of the camera is the origin O of the preset camera coordinate system_CThe X axis being X_CThe Y axis is Y_CThe Z axis being Z_CP is the top point of the object to be measured, AB is the longest bottom edge of the object to be measured, if the object to be measured is a building, the building is a cuboid, P is the top point of the building, AB is the length of the building, and the coordinate of the top point P of the building to be measured in the preset camera coordinate system is (X)_C，Y_C，Z_C)。

When shooting is performed, the preset image coordinate system corresponding to the preset camera coordinate system is an x-y coordinate system as shown in fig. 2, where the image coordinate system is a rectangular coordinate system established by an image center point and using pixels as a unit, o is an origin of the image coordinate system, an abscissa is an x-axis, and an ordinate is a y-axis. As described above, if P is the vertex of the object to be measured, P is the corresponding point of the vertex P in the image coordinate system, i.e. the coordinate of the point P of the vertex P in the preset image coordinate system is (x, y). It is understood that, in fig. 2, f is the focal length of the camera used for shooting.

After the image to be detected including the target image is acquired, target detection information can be determined according to the image to be detected and a preset detection model. The target detection information comprises three-dimensional position information of the object to be detected in a preset camera coordinate system. It can be understood that the shape of the object to be measured is similar to a cuboid, and the coordinates of the central points of the upper and lower surfaces of the cuboid in the preset camera coordinate system are determined, that is, the three-dimensional position information of the object to be measured in the preset camera coordinate system is determined.

It is to be understood that the preset detection model may be a model that outputs picture data information according to image information.

S13: and determining relative position data between the object to be detected and the terminal equipment according to the three-dimensional position information so that the terminal equipment can position the object to be detected according to the relative position data.

After determining target detection information according to the image to be detected and the preset detection model, the target detection information includes three-dimensional position information of the object to be detected in the preset camera coordinate system, specifically, assuming that the object to be detected is a building, the shape of the building can be similar to a cuboid, and coordinates of central points of upper and lower surfaces of the cuboid in the preset camera coordinate system are the three-dimensional position information of the object to be detected in the preset camera coordinate system. It can be understood that, since the preset camera coordinate system is the coordinate system corresponding to the camera, that is, the preset camera coordinate system is determined by the position of the camera on the terminal device when the image to be measured is acquired, the relative position data between the object to be measured and the terminal device can be determined according to the three-dimensional position information, and the relative position data can indicate the relative data between the position of the camera when the image to be measured is acquired by the terminal device and the actual position of the object to be measured in the actual three-dimensional space, for example, the relative position data is 200 meters directly in front of the terminal device, or 45 degrees 300 meters left in front of the terminal device, and so on.

After the relative position data between the object to be measured and the terminal device is obtained, the terminal device is used as a reference point, and the terminal device can position the object to be measured according to the relative position data.

Optionally, the object positioning method provided in the embodiment of the present application may be implemented in real time, and it may be understood that the terminal device obtains the image to be detected in real time through the camera, and the terminal device positions the object to be detected in real time according to the relative position data.

The object positioning method provided by this embodiment is applied to a terminal device, which includes a camera, and acquires an image to be detected through the camera, where the image to be detected includes a target image, the target image is an image corresponding to an object to be detected, then determines target detection information according to the image to be detected and a preset detection model, the determined target detection information includes three-dimensional position information of the object to be detected in a preset camera coordinate system, and the preset camera coordinate system is a coordinate system corresponding to the camera, and finally determines relative position data between the object to be detected and the terminal device according to the three-dimensional position information, so that the terminal device positions the object to be detected according to the relative position data. The object positioning method provided by the embodiment can realize positioning while detecting the object to be detected, and can provide important technical support for applying the digital mapping technology to the automation and intelligent technical field by realizing positioning of the object to be detected, for example, provide key technical support for the fields of intelligent driving, augmented reality and the like.

Optionally, on the basis of the embodiment shown in fig. 1, after determining the relative position data between the object to be measured and the terminal device according to the three-dimensional position information, the object positioning method provided in the embodiment of the present application further includes:

and determining the driving route of the vehicle according to the relative position data so that the vehicle avoids the object to be detected, wherein the terminal equipment is the vehicle.

Fig. 3 is a schematic view of an application scenario of the object positioning method according to the embodiment of the present application, and as shown in fig. 3, in this embodiment, the terminal device is a vehicle 1, and the object to be detected is a building 2. When the vehicle 1 runs on a running route including the building 2, the vehicle 1 acquires an image including the building 2 through a camera arranged on the vehicle, and positions the building 2, namely obtains three-dimensional position information of the building 2. When the three-dimensional position information is determined, the relative position data between the building 2 and the vehicle 1 can be obtained, for example, the building 2 is located 200 meters directly in front of the current position of the vehicle 1, or the left front is 45 degrees away from the current position of the vehicle by 300 meters. Therefore, the vehicle 1 can more accurately plan the driving route of the vehicle 1 according to the obtained relative position data, so that the vehicle 1 avoids the building 2.

The object positioning method provided by this embodiment can determine the driving route of the vehicle according to the relative position data, so that the vehicle avoids the object to be detected, and the safe driving of the vehicle is ensured, especially in the field of intelligent driving, such as the field of unmanned driving, the object positioning method provided by this embodiment can provide important technical support for the unmanned driving technology.

Optionally, fig. 4 is a schematic view of another application scenario of the object positioning method according to the embodiment of the present application, and as shown in fig. 4, after determining the relative position data between the object to be measured and the terminal device according to the three-dimensional position information, the method further includes:

determining the superposition position information of the augmented reality content according to the relative position data, and displaying an augmented reality scene image according to the augmented reality content, the reality scene image and the superposition position information, wherein the reality scene image is an image shot by a camera and comprises a target image.

As shown in fig. 4, the terminal device in this embodiment is an AR glasses 3, and the object to be measured is a building 4. After the three-dimensional position information of the building 4 is determined by the AR glasses 3, the relative position data between the AR glasses 3 and the building 4 can be determined by the three-dimensional position information, the superimposed position information of the augmented reality content can be determined according to the relative position data, and the augmented reality scene image is displayed according to the augmented reality content, the real scene image and the superimposed position information, it can be understood that when the obtained virtual object is superimposed into the real scene by the AR glasses 3 according to the correct spatial perspective relation, the AR glasses 3 can determine the exact position of the virtual object superimposed into the real scene according to the relative position data between the AR glasses 3 and the building 4, after the exact position of the virtual object in the real scene is determined, the virtual information rendering system of the AR glasses 3 renders the virtual object, and finally, the virtual-real fusion display system superimposes the virtual object onto the content in the real scene for display, that is, the augmented reality scene image is displayed. The augmented reality content is a virtual object, and the real scene image is an image which is shot by a camera and contains the building 4.

According to the object positioning method provided by the embodiment, the superposition position information of the augmented reality content can be determined according to the relative position data, so that the augmented reality scene image is displayed according to the augmented reality content, the reality scene image and the superposition position information, the displayed augmented reality scene image is more suitable for the actual scene, and the user experience of using the augmented reality device is improved.

In a possible design, step S12 may be implemented by the steps shown in fig. 5, where fig. 5 is a schematic flowchart of a process for determining target detection information provided in an embodiment of the present application, and as shown in fig. 5, the process includes the following steps:

s121: and determining two-dimensional information of the target image in a preset image coordinate system according to the image to be detected and a preset detection model, wherein the two-dimensional information is used for representing the relative position of the target image in the image to be detected.

Determining two-dimensional information of the target image in the preset image coordinate system according to the image to be detected and the preset detection model, which can be understood as determining two-dimensional information of the target image in the image to be detected in the preset image coordinate system according to the image to be detected and the preset detection model, where the two-dimensional information may include information that can specify a position relationship of the target image in the image to be detected, such as coordinate information, size information, observation angle information, and the like of the target image. The two-dimensional information is used for representing the relative position of the target image in the image to be measured.

Further, in order to make the two-dimensional information of the target image in the image to be detected more clear, optionally, determining the two-dimensional information of the target image in the preset image coordinate system according to the image to be detected and the preset detection model, including:

and determining two-dimensional information according to the image to be detected and a preset detection model, wherein the two-dimensional information comprises coordinate information of a target marking frame, size information of the target marking frame and angle information of a target observation angle.

It can be understood that, when the target image is marked by using the marking frame, the marking frame of the marked target image is the target marking frame. In other words, the target mark frame marks an image corresponding to the object to be measured in the image to be measured, i.e., a target image. The observation angle is an included angle between a connection line between an optical center of a camera used for shooting and a center point of an object to be measured and a plane where the object to be measured is located, as shown in fig. 6, fig. 6 is a schematic view of the observation angle of the object to be measured provided in this embodiment of the present application. The camera position in fig. 6 is the position of the camera of the terminal device, and the object to be measured is shown by taking a building as an example. Therefore, the two-dimensional information of the target image in the preset image coordinate system can be determined according to the image to be detected and the preset detection model, that is, the coordinate information of the target marking frame, the size information of the target marking frame and the angle information of the target observation angle can be determined according to the image to be detected and the preset detection model. In other words, two-dimensional information of the target image in the preset image coordinate system is determined, and the two-dimensional information comprises coordinate information of the target marking frame, size information of the target marking frame and angle information of the target observation angle. For example, the object indication box is represented by B^2dTo indicate the target mark frame B^2dThe coordinate information of the central point in the preset image coordinate system is (x)^2d,y^2d) Size information h^2dAnd w^2dAre respectively provided withIndicating the height and width of the target marker box, and the angular information of the target viewing angle is α.

S122: and determining three-dimensional position information according to the two-dimensional information and the mapping relation between the preset camera coordinate system and the preset image coordinate system.

As described above in the concepts of the preset camera coordinate system and the preset image coordinate system, it can be seen that the following mapping relationship exists between the preset camera coordinate system and the preset image coordinate system:

continuing with the description of FIG. 2, in FIG. 2, point B and point P are respectively related to the origin O of the predetermined camera coordinate system_CIs connected, wherein the straight line O_CThe intersection of B with the x-axis of the image coordinate system is point C, between which several triangles are formed (the triangles are denoted by Δ) respectively: Δ ABO_C、ΔoCO_C、ΔPBO_CAnd Δ pCO_CAnd, according to the triangle similarity principle, the following similarity relationship exists between the triangles:

ΔABO_C～ΔoCO_C(1)

ΔPBO_C～ΔpCO_C(2)

according to the above similarity relationship, the following relationship can be obtained between the sides of the triangle:

then

And

further, assuming that the shape of the object to be measured can approximate a rectangular parallelepiped, the size of the rectangular parallelepiped can be expressed as L ═ w, h, L, where w, h, L are the width, height, and length of the rectangular parallelepiped, respectively. The projection of the image on the preset image coordinate system is a rectangle. In the object positioning method provided by the embodiment of the application, when a large amount of data acquired in the implementation of the method is counted, it is found that a stable projection close to the upper midpoint of the rectangle exists in the preset image coordinate system for the central point of the upper surface of the cuboid, and a similar projection close to the lower midpoint of the rectangle exists in the preset image coordinate system for the central point of the lower surface of the cuboid. When the object to be measured is farther away from the camera of the terminal equipment, the projection of the central points of the upper surface and the lower surface of the cuboid to which the object to be measured is similar is closer to the central point of the upper side and the lower side of the rectangle; when the observation angle a of the object to be measured relative to the camera is smaller and smaller, the projection of the central points of the upper surface and the lower surface of the cuboid to which the object to be measured is similar is closer to the middle points of the upper side and the lower side of the rectangle.

Accordingly, a parameter λ may be defined, which may have the following relationship with the width of the rectangle and the viewing angle α:

wherein h is₀And α₀Is a statistic estimated from a data set consisting of a large amount of data, w₁And w₂Is a weight parameter.

It is to be understood that the width h of the rectangle is due to the projection relationship^2dI.e., the height of the object-marking box in the above description, the viewing angle α is also the angle of the object viewing angle^2dHas a center point coordinate of (x)^2d,y^2d) Height and width are each h^2dAnd w^2dCombining equations (1) - (6), the coordinate of the central point of the upper surface of the cuboid corresponding to the target marking frame in the preset image coordinate system can be obtained as

The coordinate of the central point of the lower surface of the cuboid in a preset image coordinate system is

The length isThe coordinates of the center points of the upper and lower surfaces of the cube in the preset image coordinate system are the coordinates of the center points of the upper and lower surfaces of the object to be measured in the preset image coordinate system.

Homogenizing the coordinates of the center points of the upper and lower surfaces of the object to be measured to obtain homogenized coordinates of the center point of the upper surface of the object to be measured

Similarly, the homogenized coordinates of the center point of the lower surface of the object to be measured

After the uniform coordinates of the central points of the upper surface and the lower surface of the object to be measured are obtained, the normalized coordinates of the central points of the upper surface and the lower surface of the object to be measured can be obtained by using the internal reference matrix K of the camera of the terminal equipment:

thus, the normalized height of the object to be measured can be obtained

Namely:

assuming that the distance between the plane of the camera of the terminal device and the object to be measured is represented by z, there is a relationship:

wherein h is the actual height of the object to be measured.

In summary, the coordinates of the center points of the upper and lower surfaces of the object to be measured in the preset camera coordinate system are respectively expressed as:

by combining the formulas (7) - (11), the coordinates of the center points of the upper and lower surfaces of the object to be measured in the preset camera coordinate system can be expressed in a mathematical relationship through the coordinate information of the target marking frame, that is, the coordinates of the center points of the upper and lower surfaces of the object to be measured in the preset camera coordinate system are obtained. When the coordinates of the upper surface and the lower surface of the object to be measured in the preset camera coordinate system are obtained, for the object to be measured with the shape similar to a cuboid, the position of the object to be measured in the preset camera coordinate system is equivalently fixed, and the three-dimensional position information of the object to be measured in the preset camera coordinate system is obtained.

It should be noted that, for convenience of description, the shape of the object to be measured is similar to a rectangular solid in the embodiment of the present application, and the object positioning method provided in the embodiment of the present application is also applicable to the object to be measured having a shape similar to a square. If the shape of the object to be measured is similar to other hexahedrons, for example, a quadrangular frustum pyramid, the rectangle or square of the hexahedron projected by taking the camera as the reference point is a trapezoid. The present embodiment is not limited to this.

The embodiment provides a method for determining target detection information according to an image to be detected and a preset detection model, which includes determining two-dimensional information of the target image in a preset image coordinate system according to the image to be detected and the preset detection model, and then determining three-dimensional position information of an object to be detected in the preset camera coordinate system according to the two-dimensional information and a mapping relation between the preset camera coordinate system and the preset image coordinate system. Therefore, the detection and the positioning of the object to be detected are realized, and the positioning of the object to be detected can provide important technical support for the application of the digital mapping technology to the technical fields of automation and intelligence, such as intelligent driving, augmented reality and the like.

On the basis of the foregoing embodiment, optionally, before determining the target detection information according to the image to be detected and the preset detection model, the method further includes:

the method comprises the steps of training a preset convolutional neural network by utilizing a labeling data training set to generate a preset detection model, wherein the labeling data training set comprises a plurality of training images and labeling marking frame information, the labeling marking frame information comprises labeling marking frame position information, labeling marking frame size information and labeling observation angle information, and the labeling marking frame information is used for marking the position of a target image in the training images.

Before determining target detection information according to the image to be detected and the preset detection model, the preset convolutional neural network can be trained by using the labeled data training set to generate the preset detection model. The labeled data training set includes a plurality of training images and labeled frame information, and it can be understood that a plurality of training images are obtained, that is, a plurality of images including the object to be detected are photographed, for example, images including the object to be detected are photographed from different angles, such as front, side and back, or are photographed at different distances from the object to be detected, and the like.

And taking the obtained multiple images as training images, and marking the images corresponding to the object to be detected on the training images by using a marking frame to obtain a marking frame. When the marking frame is determined, the information of the marking frame can be obtained, the information of the marking frame comprises position information of the marking frame, size information of the marking frame and marking observation angle information, the position information of the marking frame is coordinates of the upper left corner and the lower right corner of the marking frame in a preset image coordinate system, and the marking observation angle information is an included angle between a connecting line of a shot optical center and a central point of an object to be measured and a plane where the object to be measured is located when a training image is shot to obtain. The marking of the training image with the marking frame information is to mark the position of the target image in the training image.

In the object positioning method provided in this embodiment, before determining target detection information according to an image to be detected and a preset detection model, a preset convolutional neural network is trained by using a labeled data training set to generate the preset detection model, so that the improvement on the preset convolutional neural network is realized, and the generated preset detection model can accurately predict target observation angle information of the target image.

Optionally, fig. 7 is a schematic flowchart of a process of generating a preset detection model according to an embodiment of the present disclosure, and as shown in fig. 7, the embodiment provides a method of training a preset convolutional neural network by using an annotated image training set to generate a preset detection model, including:

s401: and determining the information of the prediction marking frame according to a preset convolutional neural network.

The prediction marking frame information is used for representing the position of the target image in the training image, and comprises position information of the prediction marking frame, size information of the prediction marking frame and prediction observation angle information.

The prediction marking frame information is determined according to the preset convolutional neural network, and it can be understood that the obtained training image including the object to be detected is subjected to object detection through the preset convolutional neural network, and the obtained prediction marking frame information can represent the position of the target image in the training image, for example, the position of the target image is represented by using the coordinate information of the upper left corner and the lower right corner of the prediction marking frame in a preset image coordinate system. The predicted marking frame information includes position information of the predicted marking frame, size information of the predicted marking frame, and a predicted observation angle.

S402: and determining a marking frame matching value according to the prediction marking frame information and the marking frame information.

In the field of target detection technology, Intersection-Over-Union (IOU) is commonly used to characterize the overlapping rate of a Candidate Box (Candidate Box) and an original labeled Box (group try Box) generated during target detection, i.e. the ratio of their Intersection to Union. The optimal situation is complete overlap, i.e. a ratio of 1.

Specifically, in the embodiment of the present application, the labeled marker box information is original marker box information, and the predicted marker box information is candidate box information generated during target detection. Determining a matching value of the marking frame according to the predicted marking frame information and the marking frame information, wherein the target detection is performed on the training image by a preset convolutional neural network to obtain the marking frame information, and meanwhile, an overlapping rate, namely the marking frame matching value in the implementation, is obtained, wherein the higher the overlapping rate is, the higher the matching degree of the predicted marking frame information and the marking frame information is.

S403: and selecting the predicted observation angle information corresponding to the predicted marking frame information with the marking frame matching value larger than the preset matching value to carry out discretization processing so as to generate a plurality of discrete observation angle information.

After the matching value of the marking frame is determined according to the information of the predicted marking frame and the information of the marking frame, the information of the predicted marking frame with the matching value of the marking frame larger than the preset matching value is selected, and discretization processing is carried out on the corresponding predicted observation angle information in the selected information of the predicted marking frame to generate a plurality of discrete observation angle information. The discretization interval can be [ -pi, pi ], and after discretization, the observation angle information can be discretized into a plurality of observation angle information, so that a plurality of discretization observation angle information can be obtained.

S404: and determining the observation angle information of the label according to the marked observation angle information and the discrete observation angle information.

After obtaining a plurality of discrete observation angle information, the labeled observation angle information and the discrete observation angle information are respectively subjected to difference, and the obtained difference is label observation angle information which can be used as delta α_gAnd (4) performing representation.

S405: and determining a loss value between the predicted observation angle information and the label observation angle information according to the label observation angle information and a preset loss function.

After the tag observation angle information is obtained, a loss value between the predicted observation angle information and the tag observation angle information is determined by using a preset loss function. Alternatively, the preset loss function may be a smoothed version L1 loss function.

S406: and updating parameters of the preset convolutional neural network according to the loss value to generate a preset detection model.

And after obtaining a loss value between the predicted observation angle information and the label observation angle information, performing parameter updating on the preset convolutional neural network by using the loss value to generate a preset detection model. The branch for predicting the observation angle information is added in the generation process of the preset detection model, so that the preset detection model can accurately predict the observation angle. The target detection of the image to be detected can reach the expected accuracy through the uninterrupted parameter updating of the generated preset detection model.

The method for generating a predetermined detection model by training a predetermined convolutional neural network using a labeled image training set includes determining, by a predetermined convolutional neural network, predicted frame information that can represent a position of a target image in a training image, the predicted frame information including position information of a predicted frame, size information of the predicted frame, and predicted observation angle information, determining a frame matching value according to the predicted frame information and the labeled frame information, selecting a predicted frame having a frame matching value greater than the predetermined matching value, discretizing predicted observation angle information corresponding to the predicted frame information to generate a plurality of discrete observation angle information, determining label observation angle information according to the labeled observation angle information and the discrete observation angle information, and using a predetermined loss function, and determining a loss value between the predicted observation angle information and the label observation angle information, and finally updating parameters of the preset convolutional neural network according to the obtained loss value to generate a preset detection model. The preset detection model provided by the embodiment predicts the observation angle information by adding a new mark frame information of the branch, so that the accurate information of the target observation angle of the target image can be obtained when the target detection is performed on the image to be detected through the preset detection model.

Fig. 8 is a schematic structural diagram of an object positioning apparatus according to an embodiment of the present application, and as shown in fig. 8, an object positioning apparatus 80 according to the embodiment includes:

the acquiring module 81 is configured to acquire an image to be detected, where the image to be detected includes a target image, the target image is an image corresponding to an object to be detected, and the camera is configured to acquire the image to be detected;

the first determining module 82 is configured to determine target detection information according to the image to be detected and a preset detection model, where the target detection information includes three-dimensional position information of the object to be detected in a preset camera coordinate system, and the preset camera coordinate system is a coordinate system corresponding to the camera.

The second determining module 83 is configured to determine, according to the three-dimensional position information, relative position data between the object to be detected and the terminal device, so that the terminal device positions the object to be detected according to the relative position data.

The implementation principle and technical effect of the object positioning device provided in this embodiment are similar to those of the embodiment shown in fig. 1, and are not described herein again.

In one possible design, the object positioning apparatus 80 provided in the embodiment of the present application further includes:

and a third determining module 84, configured to determine a driving route of the vehicle according to the relative position data, so that the vehicle avoids the object to be detected, where the terminal device is the vehicle.

The implementation principle and technical effect of the object positioning device provided in this embodiment are similar to those of the embodiment shown in fig. 3, and are not described herein again.

Optionally, the object positioning apparatus 80 provided in this embodiment of the present application further includes:

the fourth determining module 85 is configured to determine, according to the relative position data, overlay position information of the augmented reality content, so as to display an augmented reality scene image according to the augmented reality content, the reality scene image, and the overlay position information, where the reality scene image is an image captured by the camera, and the reality scene image includes the target image.

The implementation principle and technical effect of the object positioning device provided in this embodiment are similar to those of the embodiment shown in fig. 4, and are not described herein again.

In one possible design, the first determining module 82 provided in this embodiment of the present application includes:

the first determining submodule 821 is configured to determine, according to the image to be detected and the preset detection model, two-dimensional information of the target image in a preset image coordinate system, where the two-dimensional information is used to represent a relative position of the target image in the image to be detected.

The second determining submodule 822 is configured to determine three-dimensional position information according to the two-dimensional information and a mapping relationship between a preset camera coordinate system and a preset image coordinate system.

Optionally, the first determining submodule 821 provided in the embodiment of the present application is specifically configured to:

the training module 86 is configured to train the preset convolutional neural network by using a labeled data training set to generate a preset detection model, where the labeled data training set includes a plurality of training images and labeled marking frame information, the labeled marking frame information includes labeled marking frame position information, labeled marking frame size information and labeled observation angle information, and the labeled marking frame information is used to mark the position of the target image in the training images.

Optionally, the training module 86 provided in the embodiment of the present application is specifically configured to:

determining prediction marking frame information according to a preset convolutional neural network, wherein the prediction marking frame information is used for representing the position of a target image in a training image, and the prediction marking frame information comprises position information of a prediction marking frame, size information of the prediction marking frame and prediction observation angle information;

selecting predicted observation angle information corresponding to predicted marking frame information with a marking frame matching value larger than a preset matching value to carry out discretization processing so as to generate a plurality of discrete observation angle information;

and updating parameters of the preset convolutional neural network according to the loss value to generate a preset detection model.

The implementation principle and technical effect of the training module in the object positioning device provided by this embodiment are similar to those of the embodiment shown in fig. 7, and are not described herein again.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device according to the embodiment may be used to execute an object positioning method provided in the method embodiment, as shown in fig. 9 (taking a processor as an example), an electronic device 500 provided in the embodiment includes a camera 501; at least one processor 502; and a memory 503 communicatively coupled to the at least one processor 502; the memory 503 stores instructions executable by the at least one processor 502, and the instructions are executed by the at least one processor 502, so that the at least one processor 502 can execute the steps of the object positioning method in the foregoing embodiments, which may be referred to in detail in the related description of the foregoing method embodiments.

In an exemplary embodiment, the present application provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the steps of the object localization method in the above embodiments. For example, the readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An object positioning method is applied to a terminal device, the terminal device comprises a camera, and the method comprises the following steps:

2. The object positioning method according to claim 1, further comprising, after determining the relative position data between the object to be measured and the terminal device according to the three-dimensional position information:

3. The object positioning method according to claim 1, further comprising, after determining the relative position data between the object to be measured and the terminal device according to the three-dimensional position information:

4. The object positioning method according to any one of claims 1 to 3, wherein the determining target detection information according to the image to be detected and a preset detection model comprises:

5. The object positioning method according to claim 4, wherein before determining the target detection information according to the image to be detected and a preset detection model, the method further comprises:

6. The method of claim 5, wherein the training a predetermined convolutional neural network with a training set of labeled data to generate the predetermined detection model comprises:

7. The object positioning method according to claim 6, wherein the determining two-dimensional information of the target image in a preset image coordinate system according to the image to be detected and a preset detection model comprises:

8. An object positioning apparatus, applied to a terminal device, the terminal device including a camera, the apparatus comprising:

9. An electronic device, comprising:

a camera;

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the object localization method of any one of claims 1-7.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the object localization method of any one of claims 1-7.