CN112132895B

CN112132895B - Image-based position determination method, electronic device, and storage medium

Info

Publication number: CN112132895B
Application number: CN202010947211.1A
Authority: CN
Inventors: 赵龙贺; 钱智明; 李正宁; 魏曦; 赵磊
Original assignee: Hubei Ecarx Technology Co Ltd
Current assignee: Ecarx Hubei Tech Co Ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2021-07-20
Anticipated expiration: 2040-09-10
Also published as: CN112132895A

Abstract

The embodiment of the application provides a position determining method based on an image, an electronic device and a storage medium, which are applied to the technical field of image processing, wherein an essential matrix comprises two attributes of rotation and translation, the position of a static target is determined by key points through the essential matrix, the rotation attribute of the target is considered, the consistent physical size of the target is not needed, a normal vector of the static target is not needed to be vertical to an image plane of an image acquisition device, meanwhile, the key points are used for calculation, only the position of the key points needs to be determined, the pixel boundary of the static target does not need to be strictly positioned, the limitation of objective conditions on the positioning accuracy is reduced, and the accuracy of the target position determined based on the image can be improved.

Description

Image-based position determination method, electronic device, and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method for determining a position based on an image, an electronic device, and a storage medium.

Background

With the development of technologies such as unmanned aerial vehicles and automatic vehicle driving, unmanned driving technologies are increasingly applied to production and life of the human society. In the unmanned technology, the accurate position of a surrounding target is sensed, and the unmanned obstacle avoidance and path planning play a crucial role. How to accurately determine the position of the target relative to the vehicle/drone is therefore a critical issue.

In the prior art, a camera carried by a vehicle/unmanned aerial vehicle is used for acquiring images of surrounding targets, and based on the physical size and the pixel proportion of the targets, the positions of the targets in a camera coordinate system are calculated according to the similarity of triangles imaged by small holes.

However, with the above method, three conditions need to be satisfied: 1. the physical sizes of the targets are consistent; 2. the normal vector of the target is perpendicular to the phase plane of the image; 3. and accurately positioning the position of the target boundary in the image. In an actual scene, the three conditions are difficult to be simultaneously satisfied, so that the target position determined based on the image has a large error.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image-based position determining method, an electronic device, and a storage medium, so as to increase accuracy of a target position determined based on an image. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides an image-based position determining method, where the method includes:

acquiring a current frame image;

analyzing the current frame image to obtain the coordinates of key points of a reference static target in the current frame image as the coordinates of a first key point;

determining coordinates of a second key point corresponding to the first key point in a historical frame image before the current frame image to obtain a key point pair of the reference static target, wherein the first key point and the second key point corresponding to the first key point form the key point pair of the reference static target;

calculating to obtain an essential matrix for representing coordinate transformation according to the coordinates of the first key point and the coordinates of the second key point in the key point pair of the reference static target;

determining a coordinate translation vector from the acquisition position of the historical frame image to the acquisition position of the current frame image;

acquiring key point pairs of a static target to be detected in the current frame image and the historical frame image;

and calculating to obtain the relative position of the static target to be detected according to the key point pairs of the static target to be detected, the essential matrix and the coordinate translation vector.

In a possible implementation manner, the analyzing the current frame image to obtain coordinates of the key points of the reference static target in the current frame image as coordinates of the first key points includes:

determining the area where the reference static target in the current frame image is located;

and carrying out corner point detection on the area where the reference static target is located to obtain coordinates of the corner points of the reference static target as coordinates of the first key point.

In a possible implementation manner, the performing corner detection on the region where the reference static target is located to obtain coordinates of a corner of the reference static target as coordinates of the first key point includes:

carrying out corner detection on the region where the reference static target is located to obtain an initial corner of the reference static target;

performing sub-pixelation processing on the current frame image, and performing grid division on the sub-pixelation processed current frame image according to a preset interval;

and performing non-maximum suppression on the initial corner points in each grid of the current frame image after sub-pixelation processing to obtain the coordinates of the corner points of the reference static target after non-maximum suppression as the coordinates of the first key points.

In a possible implementation manner, the determining, in a history frame image before the current frame image, coordinates of a second keypoint corresponding to the first keypoint to obtain a keypoint pair of the reference static target includes:

selecting one frame image from the frame images adopted before the current frame image as a historical frame image;

and tracking a point with the luminosity most similar to that of the first key point in the historical frame image by using an optical flow tracking algorithm to serve as a second key point, so as to obtain a key point pair of the reference static target.

In a possible implementation manner, the selecting one of the frame images used before the current frame image as a historical frame image includes:

selecting an Nth frame of image collected before the current frame of image as an image to be compared, wherein N is a positive integer;

judging whether the parallax between the selected image to be compared and the current frame image is larger than a preset parallax threshold value or not;

if the parallax is not larger than the preset parallax threshold, increasing N by a specified value and returning to execute the step of taking the Nth frame of image collected before the current frame of image is selected as the image to be compared;

and if the parallax is larger than the preset parallax threshold value, taking the selected image to be compared as the historical frame image.

In one possible embodiment, the essential matrix comprises a normalized translation vector;

the determining a coordinate translation vector from the acquisition position of the historical frame image to the acquisition position of the current frame image comprises:

acquiring the distance from the acquisition position of the historical frame image to the acquisition position of the current frame image as a movement distance;

and calculating a coordinate translation vector from the acquisition position of the historical frame image to the acquisition position of the current frame image according to the normalized translation vector and the movement distance.

In one possible embodiment, the essential matrix comprises a normalized translation vector and a rotation matrix;

the calculating the relative position of the static target to be detected according to the key point pairs of the static target to be detected, the essential matrix and the coordinate translation vector comprises:

acquiring coordinates of a third key point and a fourth key point in the key point pair in a coordinate system of image acquisition equipment for each key point pair in the static target to be detected, wherein the third key point is a key point of the static target to be detected in the current frame image, and the fourth key point is a key point of the static target to be detected in the historical frame image;

using the formula s₀x₀＝s₁Rx₁+ t and

calculating to obtain a first scale s₁And a second dimension s₀Wherein x is₀And x₁Respectively representing the coordinates of the third key point and the fourth key point in a coordinate system of the image acquisition equipment, wherein R represents the rotation matrix, t represents a coordinate translation vector, and Λ represents an antisymmetric matrix;

according to the formula s₀x₀Calculating the relative position of the static target to be detected, wherein K represents an internal parameter of the image acquisition equipment, P represents a three-dimensional coordinate point of a point in the world coordinate system corresponding to the third key point relative to the image acquisition equipment, and the relative position of the static target to be detected is represented by the three-dimensional coordinate point.

In a possible implementation manner, the acquiring key point pairs of the static target to be detected in the current frame image and the historical frame image includes:

matching each static target of the current frame image with each static target of the historical frame image to obtain each full matching result, wherein each full matching result comprises a plurality of static target pairs, and each static target pair comprises a static target in the current frame image and a static target in the historical frame image;

respectively calculating the distance between two static targets in each static target pair to obtain the distance parameter of each static target pair;

calculating the sum of the distance parameters of each static target pair in the full matching result aiming at each full matching result to obtain the total distance of the full matching result;

selecting a full-matching result with the minimum total distance as a target full-matching result;

selecting a static target pair corresponding to a target to be detected from each static target pair of the target full-matching result to obtain a first static target and a second static target, wherein the first static target is a static target in the current frame image, and the second static target is a static target in the historical frame image;

acquiring coordinates of each key point of the first static target under a coordinate system of image acquisition equipment;

and respectively determining the coordinates of the key points of the first static target in the second static target according to the coordinates of the key points of the first static target and the essential matrix to obtain the key point pairs of the static target to be detected.

In a second aspect, an embodiment of the present application provides an electronic device, including a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to implement any of the image-based position determining methods described above when executing the program stored in the memory.

In a third aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the image-based position determination methods described above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any of the image-based position determination methods described above.

The image-based position determining method, the electronic device and the storage medium provided by the embodiment of the application acquire a current frame image; analyzing the current frame image to obtain the coordinates of the key points of the reference static target in the current frame image as the coordinates of the first key points; determining coordinates of a second key point corresponding to the first key point in a historical frame image before the current frame image to obtain a key point pair of the reference static target, wherein the first key point and the corresponding second key point form the key point pair of the reference static target; calculating to obtain an essential matrix for representing coordinate transformation according to the coordinate of the first key point and the coordinate of the second key point in the key point pair of the reference static target; determining a coordinate translation vector from the acquisition position of the historical frame image to the acquisition position of the current frame image; acquiring key point pairs of a static target to be detected in a current frame image and a historical frame image; and calculating to obtain the relative position of the static target to be detected according to the key point pairs, the essential matrix and the coordinate translation vector of the static target to be detected. The essential matrix comprises two attributes of rotation and translation, the key points are used for determining the position of the static target through the essential matrix, the rotation attribute of the target is considered, the consistent physical size of the target is not needed, the normal vector of the static target is not needed to be vertical to the image plane of the image acquisition equipment, meanwhile, the key points are used for calculation, the position of the key points is only needed to be determined, the pixel boundary of the static target is not needed to be strictly positioned, the limitation of objective conditions on the positioning accuracy is reduced, and the accuracy of the target position determined based on the image can be improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of an image-based location determination method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of one possible implementation of step S12 in the embodiment shown in FIG. 1;

FIG. 3 is a schematic diagram of one possible implementation of step S13 in the embodiment shown in FIG. 1;

FIG. 4 is a schematic diagram of one possible implementation manner of step S122 in the embodiment shown in FIG. 2;

FIG. 5 is a schematic diagram of one possible implementation manner of step S131 in the embodiment shown in FIG. 3;

FIG. 6 is a schematic diagram of one possible implementation of step S15 in the embodiment shown in FIG. 1;

FIG. 7 is a schematic diagram of one possible implementation of step S17 in the embodiment shown in FIG. 1;

FIG. 8 is a schematic diagram of one possible implementation of step S16 in the embodiment shown in FIG. 1;

FIG. 9 is a schematic diagram illustrating the calculation of key point pairs of a static object to be detected according to an embodiment of the present application;

fig. 10 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to increase the accuracy of the target position determined based on the image, an embodiment of the present application provides an image-based position determining method, including:

acquiring a current frame image;

analyzing the current frame image to obtain the coordinates of the key points of the reference static target in the current frame image as the coordinates of the first key points;

determining coordinates of a second key point corresponding to the first key point in a historical frame image before the current frame image to obtain a key point pair of the reference static target, wherein the first key point and the corresponding second key point form the key point pair of the reference static target;

calculating to obtain an essential matrix for representing coordinate transformation according to the coordinate of the first key point and the coordinate of the second key point in the key point pair of the reference static target;

In the embodiment of the application, the essential matrix comprises two attributes of rotation and translation, the key points are used for determining the position of the static target through the essential matrix, the rotation attribute of the target is considered, the physical size of the target is not required to be consistent, the normal vector of the static target is not required to be vertical to the image plane of the image acquisition equipment, meanwhile, the key points are used for calculation, only the position of the key points is required to be determined, the pixel boundary of the static target is not required to be strictly positioned, the limitation of objective conditions on the positioning accuracy is reduced, and the accuracy of the target position determined based on the image can be improved.

Referring to fig. 1, fig. 1 is a schematic flowchart of an image-based position determining method according to an embodiment of the present application, and includes:

s11, a current frame image is acquired.

The image-based position determining method in the embodiment of the application can be realized through electronic equipment, specifically, the electronic equipment can be a module or a device which is carried by an unmanned aerial vehicle or an unmanned vehicle and has a calculating function, and certainly, the position determining method can also be realized through a rear-end computer which is in communication connection with the unmanned aerial vehicle or the unmanned vehicle.

The current frame image may be a frame image currently acquired by the image acquisition apparatus. The image acquisition equipment can be equipment such as a camera, a vehicle data recorder or a camera carried in vehicles such as unmanned planes or automobiles.

For example, video data may be continuously captured by the image capturing device, and a current frame image in the video data is obtained when the target process is located. The current frame image includes at least one static object, where the static object refers to an object that does not move or has a negligible moving speed (less than a preset speed threshold), for example, the static object may be a sign, a lane line, a guide arrow, a lamp post, or the like.

S12, analyzing the current frame image to obtain the coordinates of the key points of the reference static object in the current frame image as the coordinates of the first key points.

The reference static object may be any one or more static objects in the current frame image. The coordinates of the keypoints in the current frame image, referred to as the first keypoints, can be obtained by computer vision techniques, such as a deep learning network. The coordinates of the first keypoint may specifically be the coordinates of the first keypoint in the image acquisition device coordinate system.

In a possible implementation manner, referring to fig. 2, the step of analyzing the current frame image to obtain coordinates of the key point of the reference static target in the current frame image as coordinates of the first key point includes:

and S121, determining the area where the reference static object in the current frame image is located.

The target detection can be performed on the current frame image by using a computer vision technology to obtain the region where the reference static target in the current frame image is located. For example, the area of the semantic object (static object) in the current frame image in the image may be obtained based on semantic segmentation, for example, the static object may be a sign, a lane line, a guide arrow, a lamp post, or the like, and the area where the static object is located may be represented by an object box.

And S122, carrying out corner point detection on the area where the reference static target is located to obtain coordinates of the corner points of the uploaded reference static target as coordinates of the first key point.

The corner points of the reference static target are the key points of the reference static target. The angular point detection method can be used for carrying out angular point detection on the region where the static target is located by a Harris angular point detection method to obtain coordinates of the angular point of the static target.

S13, determining coordinates of a second keypoint corresponding to the first keypoint in a historical frame image before the current frame image to obtain a keypoint pair of the reference static target, where the first keypoint and the second keypoint corresponding to the first keypoint constitute the keypoint pair of the reference static target.

The historical frame image is a frame image collected before the current frame image, and the historical frame image has the same reference static target as the current frame image.

The key point coordinates of the reference static object in the historical frame image may be obtained in the same manner as the key point coordinates of the reference static object in the current frame image. However, in order to reduce the calculation amount for determining the coordinates of the key points of the reference static object in the historical frame image and facilitate the pairing of the same key points of the same reference static object in the current frame image and the historical frame image, optionally, referring to fig. 3, the step S13, which is executed in the historical frame image before the current frame image, is to determine the coordinates of the second key points corresponding to the first key points to obtain the key point pairs of the reference static object, and includes:

s131, selecting one frame image from the frame images used before the current frame image as a history frame image.

And S132, tracking and obtaining a point with the luminosity most similar to that of the first key point in the historical frame image by using an optical flow tracking algorithm to serve as a second key point, and obtaining a key point pair of the reference static target.

The method can utilize a KLT (Kanade-Lucas-Tomasi) based optical flow tracking algorithm, and according to the principle of luminosity consistency, a point with the luminosity most similar to that of a first key point of a reference static target in a current frame image is found in the historical frame image in a tracking mode and is used as a second key point of the reference static target in the historical frame image, so that the second key point coordinate of the reference static target in the historical frame image is obtained, and the first key point in the current frame image and the second key point in the historical frame image can form a key point pair.

In the embodiment of the application, the coordinates of the key points of the static target in the historical frame image are determined through an optical flow tracking algorithm, so that the calculation amount for determining the coordinates of the key points of the static target in the historical frame image can be reduced, and meanwhile, the same key points of the same static target in the current frame image and the historical frame image can be conveniently paired.

S14, determining a coordinate translation vector from the collection position of the history frame image to the collection position of the current frame image.

The translation vector from the acquisition position of the historical frame image to the acquisition position of the current frame image of the image acquisition device in the world coordinate system is called a coordinate translation vector. The coordinate translation vector may be obtained by using an Inertial Measurement Unit (IMU) corresponding to the image acquisition device. The IMU corresponding to the image acquisition equipment refers to an IMU carried in an unmanned aerial vehicle or a vehicle such as an automobile and the like carrying the image acquisition equipment. Since the IMU and the image capturing device are rigid body transformations, the motion information of the image capturing device can be obtained by the IMU.

And S15, calculating to obtain an essential matrix for representing coordinate transformation according to the coordinate of the first key point and the coordinate of the second key point in the key point pair of the reference static target.

The essence matrix is used for representing the coordinate transformation from the historical frame image to the current frame image, and comprises two attributes of rotation and translation. And the coordinates of a first key point and a second key point in the key point pair of the reference static target are respectively the coordinates of the same key point of the reference static target in the current frame image and the historical frame image. And acquiring a key point pair of a reference static target, and performing structure recovery to obtain the essential matrix.

Alternatively, the essential matrix may be calculated by an sfm (structure From motion) algorithm. According to epipolar constraint equation

Calculating to obtain an essential matrix E, wherein x0 and x1 respectively represent the coordinates of the same key point in the current frame image and the historical frame image under the coordinate system of the image acquisition equipment, and x₀＝K^-1p₀And K denotes an internal reference of the image pickup device,

representing the coordinates of the key point in the pixel coordinate system of the current frame image,

represents a transpose of a matrix; the intrinsic matrix E — t × R, where t denotes the normalized translation vector and R denotes the rotation matrix R. Optionally, a plurality of key point pairs of the reference static target may be selected to obtain the essential matrix through calculation, so as to increase the confidence.

And S16, acquiring key point pairs of the static target to be detected in the current frame image and the historical frame image.

The static target to be detected is a static target which needs to be positioned, and the static target to be detected can be a reference static target or a static target different from the reference static target. When the static state to be detected is the reference static target, the key point pair of the reference static target can be directly used as the key point pair of the static target to be detected, for example, the key point pair of the reference static target can be directly read from the memory as the key point pair of the static target to be detected. When the static target to be detected is a static target different from the reference static target, the method for acquiring the key point pair of the static target to be detected may refer to the method for acquiring the key point pair of the reference static target, and details are not described here. The pixel coordinates of the same key point of the static target to be detected in the current frame image and the historical frame image can be converted into the coordinate system of the image acquisition equipment, so that the coordinates of two key points in the key point pair of the static target to be detected are obtained.

And S17, calculating the relative position of the static target to be detected according to the key point pairs of the static target to be detected, the essential matrix and the coordinate translation vector.

According to the coordinates of two key points in the key point pair of the static target to be detected in the coordinate system of the image acquisition equipment, the actual positions of the image acquisition equipment in the world coordinate system relative to the key points corresponding to the key point pair can be changed through the coordinate translation vector and the essential matrix. And then calculating to obtain the coordinates of the corresponding key point of the key point pair relative to the image acquisition equipment by a calculation method such as a triangular distance measurement method. The above operation can be performed for each key point pair of the static target to be detected, so that the coordinates of each key point of the static target to be detected relative to the image acquisition equipment can be obtained, and the position of the static target to be detected relative to the image acquisition equipment, that is, the relative position of the static target to be detected is obtained.

In the embodiment of the present application, taking a calculation process of one static target to be detected as an example, it can be understood by those skilled in the art that, when there are multiple static targets to be detected, after obtaining the essential matrix and the coordinate translation vector, the steps S16 and S17 may be respectively performed for each static target to be detected.

In the process of determining the corner coordinates of the static object by using the corner detection, in order to increase the accuracy of the corner coordinates, each corner can be corrected. In a possible implementation manner, referring to fig. 4, in the step S122, performing corner detection on the region where the reference static object is located to obtain coordinates of a corner of the reference static object as coordinates of the first key point, where the method includes:

and S1221, performing corner detection on the region where the reference static target is located to obtain an initial corner of the reference static target.

S1222, performing sub-pixelation on the current frame image, and performing grid division on the sub-pixelation processed current frame image according to a preset interval.

And S1223, performing non-maximum suppression on the initial corner points in each grid of the current frame image after the sub-pixelation processing, and obtaining coordinates of the corner points of the reference static target after the non-maximum suppression as coordinates of the first key points.

Harris corner detection and sub-pixelation can be adopted, then grid division is carried out on the current frame image according to equal intervals, non-maximum value suppression is carried out in each grid, and corners with uniformly arranged positions are obtained, so that corner coordinates of a reference static target are obtained.

In the embodiment of the application, the corner points are corrected in the modes of sub-pixelation, grid division, non-maximum value inhibition and the like, so that more accurate corner point coordinates can be obtained, and the accuracy of the target position determined based on the image is improved.

In theory, the historical frame image may be any frame image acquired before the current frame image, but in order to increase the accuracy of the calculation, in one possible embodiment, referring to fig. 5, the step S131 selects one frame image from the frame images taken before the current frame image as the historical frame image, and includes:

s1311, selecting an nth frame image collected before the current frame image as an image to be compared, where N is a positive integer.

The nth frame image acquired before the current frame image refers to an image obtained by counting N frames from the current frame image, for example, if the current frame image is an mth frame image acquired by image acquisition equipment, the image to be compared is an M-nth frame image acquired by the image acquisition equipment, where M is a positive integer greater than N.

S1312, determining whether the parallax between the currently selected image to be compared and the current frame image is greater than a preset parallax threshold.

The parallax is a direction difference generated by observing the same target from two points with a certain distance, and is an important index for measuring whether the same target has a difference between two frames of images in a stereoscopic vision, namely the difference of imaging positions of angular points in the two frames of images. The preset parallax threshold can be set by self according to actual conditions, but it should be ensured that the imaging positions of the same corner point in the two frames of images are obviously different.

And S1313, if the parallax is not greater than the preset parallax threshold, increasing N by a specified value and returning to the step of S1311 for selecting the Nth frame of image acquired before the current frame of image as the image to be compared.

The designated value can be set in a customized manner according to the actual situation, for example, set to 1,3, 5 or 20.

And S1314, if the parallax is larger than the preset parallax threshold, taking the currently selected frame image to be compared as the historical frame image.

In the embodiment of the application, the parallax of the same corner point in the current frame image and the historical frame image can be ensured to be larger than the preset parallax threshold value, so that the accuracy of the target position determined based on the image is ensured.

In addition to directly using the displacement measured by the IMU as a coordinate translation vector, the coordinate translation vector may also be calculated in combination with the essential matrix. In a possible embodiment, the determining the coordinate translation vector from the acquisition position of the historical frame image to the acquisition position of the current frame image in the essential matrix including a normalized translation vector, see fig. 6, S15, includes:

s151, a distance from the capture position of the history frame image to the capture position of the current frame image is acquired as a movement distance.

For example, the movement distance may be obtained by using an inertial measurement unit corresponding to the image acquisition device.

And S152, calculating a coordinate translation vector from the acquisition position of the historical frame image to the acquisition position of the current frame image according to the normalized translation vector and the movement distance.

In the above steps, an essential matrix is obtained based on the structure recovery of the monocular image acquisition device, and the normalized translation vector t in the essential matrix is a normalized quantity and cannot represent the translation transformation of the image acquisition device in the world coordinate system. Thus, in conjunction with the movement distance provided by the IMU, the motion information of the image capture device may be approximated using the IMU since the IMU and the image capture device are rigid body transformations. IMU-based amount of translational motion

Calculating a modulus value

The distance traveled, i.e. the absolute measurement of the scale, is obtained by multiplying the normalized translation vector t by t_NormAnd obtaining a translation vector of the image acquisition equipment under the world coordinate system, namely a coordinate translation vector.

In the embodiment of the application, the coordinate translation vector is calculated by combining the essential matrix, and the obtained coordinate translation vector is more accurate, so that the accuracy of the target position determined based on the image can be improved.

The position of the static target relative to the image acquisition equipment can be calculated by a triangular distance measurement method. In one possible embodiment, the intrinsic matrix includes a normalized translation vector and a rotation matrix; referring to fig. 7, in step S17, the calculating the relative position of the static object to be detected according to the key point pairs of the static object to be detected, the essential matrix, and the coordinate translation vector includes:

s171, for each key point pair in the static target to be detected, obtaining coordinates of a third key point and a fourth key point in the key point pair in a coordinate system of an image capturing device, where the third key point is a key point of the static target to be detected in the current frame image, and the fourth key point is a key point of the static target to be detected in the historical frame image.

And respectively setting a third key point and a fourth key point in the key point pair of the static target to be detected as key points of the same key point of the static target to be detected in the current frame image and the historical frame image.

S172, using formula S₀x₀＝s₁Rx₁+ t and

calculating to obtain a first scale s₁And a second dimension s₀Wherein x is₀And x₁Respectively representing the coordinates of the third key point and the fourth key point in the coordinate system of the image acquisition equipment, wherein R represents the rotation matrix, t represents a coordinate translation vector, and Λ represents an antisymmetric matrix.

S173, according to the formula S₀x₀And calculating the relative position of the static object to be detected, wherein K represents an internal parameter of the image acquisition equipment, P represents a three-dimensional coordinate point of a point in the world coordinate system corresponding to the third key point relative to the image acquisition equipment, and the relative position of the static object to be detected is represented by the three-dimensional coordinate point.

In the embodiment of the application, the position of the static target is determined by using the key points through a triangular distance measurement method, the consistent physical size of the target is not needed, the normal vector of the static target is not needed to be vertical to the image plane of the image acquisition equipment, and meanwhile, the pixel boundary of the static target is not strictly positioned, so that the limitation of objective conditions on the positioning accuracy is reduced, and the accuracy of the target position determined based on the image can be improved.

Since the position of the static target is calculated based on the key point pairs in the embodiment of the application, enough key points are required to be in the static target, so that the calculated position of the static target has enough confidence. The static targets such as the signboards have rich textures and can extract more key points, so that the relative positions of the static targets to be detected can be calculated by directly using the key point pairs of the static targets to be detected. For a rod-shaped object waiting to detect a static object, because the texture is smooth and almost no key point exists, key point pairs of the same key point of the same static object to be detected, which are represented by the current frame image and the historical frame image, need to be constructed at first. In a possible implementation manner, referring to fig. 8, the step S16 of obtaining key point pairs of the static object to be detected in the current frame image and the historical frame image includes:

s161, pairing the static targets of the current frame image with the static targets of the historical frame image to obtain full pairing results, where each full pairing result includes a plurality of static target pairs, and each static target pair includes a static target in the current frame image and a static target in the historical frame image.

Any relevant pairing algorithm can be used to pair the static targets of the current frame image with the static targets of the historical frame image. In the embodiment of the application, all the static targets of the current frame image and all the static targets of the historical frame image can be paired according to all possible pairing conditions to obtain all the full pairing results, and each full pairing result comprises one pairing result of all the static targets in the current frame image and all the static targets in the historical frame image. For more convenient understanding, the following description is made by way of example, including static objects A, B and C in the current frame image; the historical frame images include static objects a, b, and c. Pairing the static targets of the current frame image and the historical frame image according to all possible pairing conditions to obtain six full-pairing results, wherein the six full-pairing results comprise: full pairing result 1: a-a, B-B and C-C; full pairing result 2: a-a, B-C and C-B; full pairing result 3: A-B, B-a and C-C; full pairing result 4: A-B, B-C and C-a; full pairing result 5: A-C, B-B and C-a; full pairing result 6: A-C, B-C and C-B.

And S162, respectively calculating the distance between two static targets in each static target pair to obtain the distance parameter of each static target pair.

S163, for each full-pairing result, calculating the sum of the distance parameters of each static target pair in the full-pairing result, and obtaining the total distance of the full-pairing result.

And S164, selecting the full-pairing result with the minimum total distance as a target full-pairing result.

And S165, selecting a static target pair corresponding to a target to be detected from all the static target pairs of the target full-matching result to obtain a first static target and a second static target, wherein the first static target is a static target in the current frame image, and the second static target is a static target in the historical frame image.

And S166, acquiring the coordinates of each key point of the first static target in the coordinate system of the image acquisition equipment.

And S167, respectively determining the coordinates of the key points of the first static target in the second static target according to the coordinates of the key points of the first static target and the intrinsic matrix, and obtaining the key point pairs of the static target to be detected.

In the following, a hungarian allocation algorithm is taken as an example for illustration, the hungarian allocation algorithm is a global allocation algorithm, and distance parameters of each static object pair are obtained based on tracking of static objects (taking a rod as an example) of the hungarian allocation algorithm. For example, for a static target pair comprising a shaft A and a shaft B, the distance parameter of the static target pair can be expressed as

Wherein

A represents the set of pixels for shaft A in the static target pair, and B represents the set of pixels for shaft B in the static target pair; d_center＝‖A_center-B_center‖₂，d_maxThe maximum distance of the rod-shaped object can be set manually; a. the_centerDenotes the center point of the shaft A, B_centerThe center point of the shaft B is indicated. Then, the total distance is established

A matrix, wherein M and N represent the number of rods in the current frame image and the historical frame image, respectively, in accordance with the matching pairs { (0) as many as possible2), (1,3), (2,0) … (m-1, n-i) } as well as their distances and the Hungarian assignment rule as small as possible, the best matching pair of rods is finally found.

The same static object in the current frame image and the above history frame image is represented as (A, A '), and the corresponding equation of line (l'₀,l′₁) As shown in FIG. 9, the points P are the common view points and are respectively at C₀And C₁Imaging p in image coordinate system₀,p₁Coordinate x of the coordinate system of the corresponding image acquisition device₀＝K^-1p₀And K is an internal reference of the image capture device (e.g., camera). The goal of the calculation is based on p₀To obtain x₀And according to x₀Calculating x₁To form key point pairs (x)₀,x₁): firstly, a point p is selected on a static target A₀And obtain a point p₀Coordinate x in the coordinate system of the image acquisition device₀According to the epipolar constraint equation

Simplified to x₁Is the epipolar line l₁Wherein E is an essential matrix; calculating epipolar line l₁And rod A 'corresponds to straight line l'₁Coordinate x of the intersection₁Forming key point pairs (x)₀,x₁)。

In the embodiment of the application, under the condition that the same key point of the same static target in the current frame image and the historical frame image is not easy to distinguish the corresponding relation, the accuracy of selecting the key point pair can be improved through the calculation mode, so that the accuracy of determining the target position based on the image is improved. Acquiring a continuous image sequence through monocular image acquisition equipment, outputting the position of each static target according to a semantic segmentation network, extracting uniform key points for each static target, recovering an actual-scale intrinsic matrix according to the intrinsic matrix and the absolute motion amount of an IMU (inertial measurement Unit), and further calculating the position of each static target in a camera coordinate system. The algorithm does not need consistent physical dimensions of elements, does not need strict target pixel boundary positioning, does not need a motion hypothesis that a target normal vector between two frames is vertical to a camera plane, and can improve the accuracy of the target position determined based on the image. And no corresponding assumption exists for the motion of the image acquisition equipment, so that the method can be suitable for high-precision target position positioning of various scenes.

An embodiment of the present application further provides an electronic device, including: a processor and a memory;

the memory is used for storing computer programs;

when the processor is used for executing the computer program stored in the memory, the following steps are realized:

acquiring a current frame image;

Optionally, referring to fig. 10, the electronic device according to the embodiment of the present application further includes a communication interface 202 and a communication bus 204, where the processor 201, the communication interface 202, and the memory 203 complete communication with each other through the communication bus 204.

Optionally, the processor is configured to implement any of the image-based position determining methods when executing the computer program stored in the memory.

The communication bus mentioned in the electronic device may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the image-based position determining methods described above.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the image-based position determination methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the electronic device, the storage medium, and the computer program, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. An image-based position determination method, the method comprising:

acquiring a current frame image;

calculating to obtain an essential matrix for representing coordinate transformation according to the coordinates of the first key point and the coordinates of the second key point in the key point pair of the reference static target; the essential matrix comprises a normalized translation vector and a rotation matrix;

using the formula s₀x₀＝s₁Rx₁+ t and

2. The method according to claim 1, wherein the key points of the reference static target are corner points of the reference static target, and the analyzing the current frame image to obtain the coordinates of the key points of the reference static target in the current frame image as the coordinates of the first key points comprises:

3. The method according to claim 2, wherein the performing corner detection on the region where the reference static target is located to obtain coordinates of the corner of the reference static target as coordinates of the first key point comprises:

4. The method of claim 1, wherein determining coordinates of a second keypoint corresponding to the first keypoint in a history frame image before the current frame image to obtain the keypoint pair of the reference static target comprises:

5. The method according to claim 4, wherein the selecting one of the frame images used before the current frame image as a history frame image comprises:

6. The method of claim 1, wherein determining a coordinate translation vector from an acquisition location of the historical frame image to an acquisition location of the current frame image comprises:

7. The method according to claim 1, wherein the obtaining key point pairs of the static target to be detected in the current frame image and the historical frame image comprises:

8. An electronic device comprising a processor and a memory;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method of any of claims 1-7.

9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.