CN115272450A

CN115272450A - Target positioning method based on panoramic segmentation

Info

Publication number: CN115272450A
Application number: CN202210199052.0A
Authority: CN
Inventors: 张永生; 吕可枫; 戴晨光; 纪松; 于英; 张振超; 李力; 李磊; 张磊; 闵杰; 王自全
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2022-11-01

Abstract

The invention provides a target positioning method based on panoramic segmentation, and belongs to the field of target positioning. Firstly, acquiring a real scene image of a target area by using an image acquisition sensor, and acquiring an initial pose of the sensor; generating a rendering image according to the initial pose of the sensor and the established three-dimensional geographic space model; then, carrying out panoramic segmentation on the real scene image and the rendered image, and carrying out corner point detection and matching on the segmentation result to obtain a matching point pair of the two images; calculating the real pose of the sensor by using the matching point pairs and the initial pose of the sensor; and finally, positioning the target in the real scene by using the real pose of the sensor. Compared with the prior art in which a deep learning method is adopted for matching, the method does not need a large number of samples, and utilizes an angular point detection method to determine the matching point pairs of two images, so that the working efficiency is improved, the matching precision between heterogeneous images is improved, the target positioning is carried out by utilizing the real pose of a sensor, and the target positioning precision is improved.

Description

Target positioning method based on panoramic segmentation

Technical Field

The invention relates to a target positioning method based on panoramic segmentation, and belongs to the field of target positioning.

Background

With the rapid development of technologies such as the internet of things and geospatial science, digital wave mats roll the world, and geospatial intelligence is taken as a cross field of deep fusion of geospatial science and artificial intelligence, and shows huge vitality and potential. Due to the enhancement of software comprehensive computing power, the continuous progress of data and model sharing, the improvement of the performance of various sensors and the reduction of cost, the geospatial intelligent sensing obtains more and more research investment. At present, the three-dimensional reconstruction of the geographic space has achieved abundant research results, and static scenes can be well expressed, however, the dynamic of the world can better serve related applications such as automatic driving, augmented reality and digital twins only by dynamically perceiving and understanding the dynamic geographic space. To realize geospatial intelligent sensing, sensing and positioning a dynamic target in a complex geographic space is an irrevocable core link.

In the process of sensing and positioning a dynamic target, a live-action three-dimensional model with a geographical reference coordinate needs to be established in advance for a target area, and sensing and positioning of the target in a scene are completed by accurately positioning a sensor carrier (such as an unmanned aerial vehicle and an unmanned vehicle), so that accurate registration of a sensor and the three-dimensional model needs to be completed. The matching method of the sensor and the three-dimensional model mainly comprises two categories: one type is a geographic registration method based on a pose sensor, for example, a manual layout or a SLAM (Simultaneous Localization and Mapping) based method is adopted to realize simple registration and apply the method to a small scene. Although this type of method is efficient and adaptable to various environments, its registration accuracy is not sufficient, and if it is used as a priori pose to locate other targets in the scene, larger errors are accumulated. The other type is to introduce some other data for matching, for example, matching is performed through the characteristics of the two-dimensional image and the three-dimensional model through the two-dimensional image such as an RGB image or a ground camera shooting image in the target area and the acquired live-action three-dimensional model, and the relationship between the two-dimensional image and the three-dimensional model is established through the two-dimensional-three-dimensional matching mode, so as to realize the accurate positioning of the target, and in addition, the precision of the introduced other data can also directly influence the matching precision.

Meanwhile, the two-dimensional-three-dimensional matching process is also influenced by the heterogeneous images, and certain differences are brought to the characteristics of the heterogeneous images due to factors such as an imaging mechanism, real-time differences and the like. In the traditional matching method, the feature-based matching method performs matching by extracting local features in a certain neighborhood as descriptors, most notably Scale Invariant Feature Transform (SIFT) descriptors and some SIFT algorithms improved on the basis of the SIFT descriptors, most of the methods cannot provide stable features, and therefore, the matching effect is difficult to obtain in heterogeneous matching work; although the template-based matching method can obtain invariant features in a larger range, the expansibility and the matching efficiency of the template-based matching method enable the method to have limited expansibility and insufficient generalization capability. With the rapid development of deep learning methods in computer vision tasks in recent years, more and more deep learning matching algorithms are proposed, for example, an attention aggregation mechanism is introduced into SuperGlue, which can jointly reason underlying 3D scenes and features, and obtain advanced matching results; DFM realizes image matching by utilizing the deepest invariance of the existing network learning and combining semantic features; and on the basis of introducing D2-Net feature extraction, a heterogeneous remote sensing image matching network CMM-Net using high-level semantic local features is provided. However, the matching method based on deep learning has high requirements on samples and needs to carry out a large amount of learning; when facing image pairs with different qualities and different texture characteristics, the matching results of the image pairs may have larger difference, so that the matching precision of the heterogeneous image pairs cannot be ensured, and the dynamic target perception and positioning precision are directly influenced.

Disclosure of Invention

The invention aims to provide a target positioning method based on panoramic segmentation, which aims to solve the problem of low target positioning precision caused by the fact that the prior art cannot ensure the matching precision of heterogeneous images.

The invention provides a target positioning method based on panoramic segmentation, which comprises the following steps:

1) Acquiring a real scene image of a target area by using an image acquisition sensor, and acquiring an initial pose of the sensor; rendering the established geospatial three-dimensional model of the target area by using the initial pose of the sensor to obtain a rendered image of the target area under the initial pose;

2) Respectively carrying out panoramic segmentation on the rendered image and the real scene image of the target area to obtain segmentation results of the rendered image and the real scene image; respectively carrying out corner detection on the segmentation results of the rendered image and the real scene image to obtain candidate corners in each image, and determining key points of the corresponding images according to the candidate corners;

3) Constructing descriptors by using key points of the two images, and determining matching point pairs of the two images according to the matching degree of the descriptors in the two images;

4) The real pose of the sensor is inversely calculated according to the initial pose of the matching point pair and the image acquisition sensor;

5) And positioning the target in the real scene by using the real pose of the sensor.

Generating a rendering image of a target area according to an initial pose of a sensor, matching the rendering image with a real scene image of the target area, taking the characteristic difference of two heterogeneous images into consideration, carrying out panoramic segmentation on the real scene image and the rendering image, carrying out corner point detection based on a segmentation result to obtain key points, and determining matching point pairs of the two images according to descriptors of the key points; compared with the prior art which adopts a deep learning method for matching, the method does not need a large number of samples and does not need to consider the quality of the samples, and then does not need to learn the samples, and the matching point pairs of the two images are found by an angular point detection method directly according to the rendering image and the corresponding real scene image under the pose of the sensor, so that the working efficiency is improved, when different textures with different qualities and different textures exist between the matching images, the accurate registration of the real scene image and the rendering image can be still realized, the pose transformation relation between the heterogeneous images is obtained, the real pose of the sensor is determined, the target is positioned by using the real pose of the sensor, and the positioning precision of the target is effectively improved.

Further, the determining process of the key point is as follows: traversing pixel points on a target contour line in an image segmentation result, establishing a square frame by taking the pixel points as a central point, intersecting the square frame and the target contour line at two intersection points, calculating included angle angles between the two intersection points and the central point, taking the pixel points corresponding to the included angle angles within a set angle threshold range as candidate angular points, and taking the candidate angular points as key points.

Further, the determining process of the key point is as follows: traversing pixel points on a target contour line in an image segmentation result, establishing two square frames with different side lengths by taking the pixel points as a central point, respectively intersecting the two square frames with the target contour line at two intersection points, calculating included angle angles between the two intersection points and the central point in each square frame, and when the included angle angles under the two square frames are both within a set angle threshold range and the angle difference value of the two included angles is smaller than a set difference threshold, taking the corresponding pixel points as candidate angular points, and determining key points according to the candidate angular points.

In the invention, the edge contour line after panoramic segmentation is taken as pixel-level precision, a large number of fine sawtooth line segments possibly exist on the obtained edge, the obtained edge is not a regular edge curve, the situation that a large number of wrong corner points are possibly detected by using a traditional corner point detection algorithm is avoided, and meanwhile, a great error and extra workload are brought to the subsequent descriptor establishment and matching process; meanwhile, one square frame can be established for one pixel point, two square frames can also be established to further improve the detection accuracy, and the two square frames are used for determining candidate angular points, so that the number of the candidate angular points determined under one square frame can be reduced to a certain extent, and the workload during subsequent matching is also reduced.

Further, when a plurality of candidate corner points exist in the set corner point region, the candidate corner point with the smallest angle difference between two included angles is selected as a key point in the set corner point region.

In order to reduce the workload of subsequently constructing the descriptor and simultaneously reduce the workload of subsequently detecting point matching, for a plurality of key points in a certain area range, the key point with the minimum angle difference between two included angles is selected as the optimal key point in the area, and the working efficiency is further improved.

Further, constructing descriptors for the key points, wherein the descriptors comprise a first descriptor and/or a second descriptor; wherein, the first descriptor is an angle average value of two included angles obtained by the key point; and the second descriptor is the label value of each pixel point on the square frame according to the set moving sequence in the square frame with smaller side length established by the key point.

The construction of descriptors is carried out on the determined key points through the process, at least one descriptor is used as a basis for realizing the matching of the subsequent key points, and when only one descriptor is selected, the calculation amount is small, and the matching efficiency is high; when two descriptors are selected, the feature description of the key points is more accurate, and the matching precision is higher.

Further, the matching degree of the key points is determined according to the quotient of the first descriptors of the key points in the two images, and when the quotient value is closer to 1, the matching degree of the key points of the two images is higher.

And determining the matching degree of the key points by utilizing the quotient of the first descriptors of the key points in the two images, wherein when the quotient value is closer to 1, namely the included angles corresponding to the key points in the two images are closer, the more probable the key points of the two images are corner points at the same position is proved.

Further, the matching degree of the key points is determined according to the alignment degree of the second descriptors of the key points in the two images, wherein the alignment degree refers to the ratio of the number of the same labels at the same position in the second descriptors of the two key points to the total number of the labels, and when the ratio is closer to 1, the matching degree of the key points of the two images is higher.

The matching degree of the key points is determined by using the mark-in-mark degree of the second descriptors of the key points in the two images, whether the labels at the positions on the square frame corresponding to the second descriptors of the two key points (namely the label value of each pixel point on the square frame) are consistent or not is determined, and the closer the ratio of the number of consistent labels to the total number of labels (the number of pixels of the square frame) is to 1, the more likely the two key points are corner points at the same position.

Further, the determination formula of the matching degree of the two image key points is as follows:

match＝ω₁match₁+ω₂match₂

in the formula, match is the matching degree of two image key points, and match₁Match degree, match, determined for two image key points using the first descriptor₂Matching degree, omega, determined for two image key points using a second descriptor₁、ω₂Are match respectively₁、match₂Weight coefficient of (a), ω₁+ω₂＝1。

According to the formula, the matching degree of key points in the real scene image and the rendered image can be determined, the matching point pairs of the two images can be determined according to the matching degree, and meanwhile, the weight coefficients of the two descriptors can be adaptively adjusted according to the importance degrees of different descriptors in the actual situation.

Further, the true pose of the sensor is calculated in the step 4) by adopting an epipolar geometric constraint method.

The motion relation between the real scene image and the rendered image can be rapidly and accurately determined by utilizing the epipolar geometric constraint method, and the real pose of the sensor can be obtained according to the initial pose of the sensor.

Further, a MaskFormer network is adopted in the step 2) to perform panoramic segmentation on the image.

The invention adopts a MaskFormer network to realize semantic segmentation and example segmentation of the image at the same time.

Drawings

FIG. 1 is a detailed flow chart of the present invention for target localization based on panorama segmentation;

FIG. 2 is a flow chart of matching a real scene image with a rendered image according to the present invention;

fig. 3 is a schematic diagram of epipolar geometry constraints.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings.

The invention provides a target positioning method based on panoramic segmentation, and the specific flow of the method is shown in figure 1. Firstly, acquiring a real scene image of a target area by using an image acquisition sensor, and acquiring an initial pose of the sensor; generating a rendering image according to the initial pose of the sensor and the established three-dimensional geographic space model; then, carrying out panoramic segmentation on the real scene image and the rendered image, and carrying out corner point detection and matching on the segmentation result to obtain a matching point pair of the two images; calculating the real pose of the sensor by using the matching point pairs and the initial pose of the sensor; and finally, positioning the target in the real scene by using the real pose of the sensor.

The specific process of matching the rendered image with the real scene image is shown in fig. 2, and the real scene image and the model rendered image are subjected to panoramic segmentation to obtain a segmentation result, a target contour line in the segmentation result is subjected to corner detection, the detected corners are screened, and corners meeting set conditions are taken as key points; and establishing a descriptor for each key point, and obtaining a matching point pair which is finally matched successfully according to the matching degree of the descriptor.

Example 1:

step 1. Obtaining data

In order to realize target positioning in a target area, two images are needed in total, wherein one image is a real scene image in the target area; one is a rendered image corresponding to the pose of the sensor during image acquisition. The invention utilizes an image acquisition sensor mounted on a stationary carrier (e.g., vehicle, drone) to acquire a real scene image of a target area. In this embodiment, the image capture sensor employs a depth camera, and the depth camera is used for shooting to obtain a depth image of a real scene in the target area. Acquiring the initial pose of the sensor according to the GPS and IMU systems carried on the sensor carrier and pre-calibrated parameters; and generating a rendering image under the initial pose, namely a rendering image under a corresponding visual angle according to the initial pose of the sensor and the established three-dimensional geospatial model. The three-dimensional geographic model can be constructed according to aerial images or radar images.

Step 2. Panorama segmentation

The invention respectively carries out panoramic segmentation on the shot real scene image and the rendered image to obtain the panoramic segmentation result of each image, and as shown in figure 2, the contour line of the object can be obtained according to the segmentation results of the two images. The MaskFormer is adopted as a panoramic segmentation network, so that instance segmentation and semantic segmentation of the image can be uniformly realized; different from the semantic segmentation task, the example segmentation needs to further distinguish different targets, such as two cats in a scene, the semantic segmentation task is to obtain all pixel points belonging to the cats by classification, and the example segmentation needs to further distinguish which pixels belong to a first cat and which pixels belong to a second cat. The MaskFormer introduces a bidirectional matching loss function proposed in the DETR, and uses a Transformer as a decoder to calculate a set consisting of point pairs, wherein each pair comprises a class prediction vector and a mask embedding vector, thereby uniformly solving the tasks of semantic level segmentation and instance level segmentation. The model used in the method takes Swin-L as a network framework, obtains 52.7PQ (panoramic quality evaluation index) on a COCO data set, and has a good segmentation effect. The COCO training set used by the model comprises 80 example labels including people, bicycles and automobiles, and 52 semantic labels including buildings, roads and sky, and can meet most of natural scenes. Meanwhile, in order to meet the requirement of real-time performance, the redundant kernels are removed from each deconvolution layer, and the segmentation time of each image is slightly prolonged on the premise of ensuring that the precision is not reduced. And for the panoramic segmentation result, matching the real scene image and the rendered image by using the semantic segmentation result of the static area, and sensing and positioning the target in the scene by using the target instance segmentation result in the real scene image.

Step 3, key point extraction

Because the contour lines of the image are pixel-level classification results obtained by panorama segmentation, and a large number of fine jagged line segments may exist in the contour lines, but the contour lines are not regular edge curves, a large number of wrong corner points may be detected by using a conventional corner point detection algorithm, so that a large error is brought to subsequent descriptor establishment and matching processes, and additional workload is brought at the same time. In order to improve efficiency and accuracy, the invention uses an angular point detection method based on an angular threshold to perform angle detection on a rendering image and a real scene image after panoramic segmentation, and extracts key points, the extraction processes of the key points of the two images are consistent, the following description takes the segmented real scene image as an example, and the specific flow is as follows:

firstly, traversing pixel points on a target contour line in a real scene image segmentation result, as shown in fig. 2, establishing two square frames with different side lengths by taking the pixel points as a central point, respectively intersecting the two square frames with the target contour line at two intersection points, calculating an included angle between the two intersection points and the central point in each square frame, when the included angles under the two square frames are both within a set angle threshold range and an angle difference value of the two included angles is smaller than a set difference threshold, taking the corresponding pixel points as candidate corner points, and determining a key point according to the candidate corner points. In the embodiment, two square frames are established, wherein the side length of one square frame is 11 pixels, and the side length of one square frame is 19 pixels; the set angle threshold range is [60 degrees, 140 degrees ], the set difference threshold is 10 degrees, namely when the included angle between the intersection point of the square frame and the target contour line and the central point is within the set angle threshold value ([ 60 degrees, 140 degrees ]), the pixel point (the central point) cannot be on the same straight line with the intersection point of the square frame and the target contour line, namely, the pixel point can be an angular point; meanwhile, considering the problem of large calculation amount caused by excessive angular points, the invention also needs to take a central point meeting the condition as a candidate angular point if the difference value of the included angles between the corresponding intersection points and the central point under the two square frames is within 10 degrees. As another embodiment, the side length of the square frame may be determined according to the resolution of the actual image, and when the image resolution is higher, the side length of the square frame may be set to be larger.

Because a plurality of pixel points may exist in a set corner region as candidate corners, the intersection point with the minimum difference between the included angles of the intersection point and the central point corresponding to two square frames is selected as a key point in the region. When only one candidate corner is set in the corner region, it is taken as a key point. The size of the set corner region is determined according to a square frame with a longer side length, for example, the side lengths of the two square frames are respectively 11 pixels and 19 pixels, and the size of the set region is 19 pixels × 19 pixels. As another embodiment, all candidate corner points may be regarded as key points.

Step 4, constructing a key point descriptor

And constructing a descriptor by using each key point, and realizing the matching of the key points through the descriptor. The descriptor is divided into two types (a first descriptor and a second descriptor). The first descriptor is an angle descriptor, and as shown in formula (1), two included angles of the key points obtained in step 3 are averaged (including angle orientation), and the average angle is the first descriptor; the second descriptor takes the top left corner vertex of the square frame with the shortest side length as a starting point, and obtains the semantic label value of the pixel point on each positive direction frame in the clockwise direction, as shown in formula (2), n is equal to the number of the pixel points of the square frame with the shortest side length.

def₂＝[label₁，label₂，label₃，......，label_n] (2)

The semantic labels represent the classification of objects in the object region, and different labels (different objects) can be represented by different values, for example, a building is represented by label =1, a road is represented by label =2, and the like.

As other embodiments, only one descriptor, for example, only the first descriptor, may be constructed, or two descriptors may be constructed at the same time.

Step 5, determining matching point pairs

The method determines the matching point pairs of the two images according to the matching degree of the descriptors in the two images.

And for the first descriptor, determining the matching degree of the key points according to the quotient of the first descriptor of the key points in the two images, wherein the matching degree of the key points of the two images is higher when the quotient value is closer to 1. As shown in the formula (3),

in order to describe the small angle of the angle,

match for a descriptor with a larger angle₁The closer to 1 the value of (A) indicates that the two descriptors are more matched, and the degree of matching between the two image key points is higher.

For the second descriptor, the matching degree is described by using the alignment degree of the two descriptors, the alignment degree is the ratio of the number of the labels at the same position in the second descriptors of the two key points to the total number of the labels, and when the ratio is closer to 1, the matching degree of the two image key points is higher. The calculation formula is as follows:

in the formula, N_trueIs def₂The number of the same tags in the same sequence, N is the total number of the tags, match₂The closer to 1 the value of (A) indicates that the two descriptors are more matched, and the degree of matching between the two image key points is higher. For example, if n in formula (2) is 40, i.e. the side length of the smaller square box is 11 pixels, the second descriptor of the key points of the real scene image is [ label_s1，label_s2，label_s3，......，label_sn]The second descriptor of the key point of the rendering image is [ label_t1，label_t2，label_t3，......，label_tn]Respectively judge label_s1And label_t1、label_s2And label_t2、......、label_snAnd label_tnWhether they are equal or not, to obtain label_si＝label_tiThe number of the tags is the same, namely the number of the tags at the same position; thus, in the formula (4), N_trueIs label_si＝label_tiN is the number of pixels of the square frame (40).

When only one descriptor is established in step 4, the matching point pairs of the two images are determined by using the matching degree of the corresponding descriptor, the calculation amount is small, the matching efficiency is high, for example, when the matching point pairs are determined by using only the first descriptor, the quotient (match) of the first descriptors of the two key points is obtained when the first descriptors of the two key points are matched₁) When the two key points are within a first set threshold range, determining the two key points as matching point pairs; or when the second descriptors of two key points are aligned when determining the matching point pairs with only the second descriptors (match)₂) When the two key points are within the second set threshold value range, the two key points are determined to be matching point pairs. When constructing twoWhen the descriptors are used, the matching point pair can be determined by utilizing the matching degree weighted calculation of the two descriptors, the feature description of the key point is more accurate, and the matching precision is higher, as shown in formula (5):

match＝ω₁match₁+ω₂match₂ (5)

in the formula, match is the matching degree of two image key points, and match₁Match degree, match, determined for two image key points using the first descriptor₂Matching degree, omega, determined for two image key points using a second descriptor₁、ω₂Are match respectively₁、match₂Weight coefficient of (a), ω₁+ω₂And =1. When match is within the third set threshold, the two keypoints are considered as matching point pairs.

In this embodiment, the first set threshold, the second set threshold, and the third set threshold are all [0.7,1], but in another embodiment, the threshold may be selected according to the image quality, and if the image quality (for example, sharpness) is not high, the size of the lowest value may be appropriately adjusted, for example, the set threshold may be adjusted to [0.6,1].

Step 6, back calculating the real pose of the sensor

After the matching point pair is obtained, the motion conditions of the imaging centers of the real scene image and the rendered image are obtained through calculation by using the matching point pair, and then the real pose of the sensor is reversely calculated according to the previously estimated initial pose. The invention uses epipolar geometric constraint in SLAM thought to recover the movement of the imaging center between two images through the corresponding relation between the matching points of the two images. As shown in FIG. 3, if the real scene image I is to be calculated₁And rendering image I₂The motion relation between them, let the motion be R, t, the camera (sensor) center be O respectively₁(real scene image) and O₂(rendering image), p₁And p₂For matching the resulting point pairs, reference, p, is made in the known camera₁And p₂And O obtained from a geospatial three-dimensional model₂After the real coordinates, O can be calculated by using the projection relation₁The true coordinates of (c). Epipolar constraint equation (6) can be obtained:

wherein E and F are intrinsic Matrix (intrinsic Matrix) and Fundamental Matrix (Fundamental Matrix), respectively, K is camera reference Matrix, x₁And x₂The coordinates of the two pixel points on the normalization plane are obtained. Equation (6) gives the spatial position relationship of two matching points in a concise way, so the camera pose estimation problem translates into two steps, first solving for E or F from the pixel position of the matching point pair, then solving for R, t from E or F. Since the essential matrix E is formally more compact, E is chosen to solve. According to the definition and the property of E, the E has 5 degrees of freedom, the minimum can be solved by using 5 points, and in consideration of the inherent non-linear property, the invention uses an Eight-point method (Eight-point-algorithm) to solve, and the specific process is as follows: in a pair of matching points, let the normalized coordinate be x₁＝[u₁，v₁，1]^T,x₂＝[u₂，v₂，1]^TFrom equation (6), we can obtain:

unfolding the matrix E into a vector form can result in E = [ E = [ ]₁，e₂，e₃，e₄，e₅，e₆，e₇，e₈，e₉]^TThe antipodal constraint can be written in linear form [ u ]₂u₁，u₂v₁，u₂，v₂u₁，v₂v₁，v₂，u₁，v₁，1]E =0. Similarly, other pairs of points can be represented, and putting all the points into a matrix equation can result in:

if the matrix satisfies the full rank condition, the essential matrix E can be solved. The last step is to recover the motion R, t of the two imaging centers by the estimated essential matrix E, and the process is completed by Singular Value Decomposition (SVD).

And finally, estimating the real pose of the sensor according to the initial pose of the sensor and the estimated R and t, thereby achieving the purpose of geographical refined registration. In addition, the calculated real pose is relative to a three-dimensional model coordinate system, and the pose accuracy of the real world depends on the geographic precision of the three-dimensional model.

Step 7, target positioning

And (3) after the real pose of the sensor is obtained and the example target in the real scene image is detected in the step (2), positioning the example target by using the depth information and the direction of the real scene image. To reduce the amount of computation, after instance objects are detected, the pixel centroid of each instance object is taken as the centroid of the instance. The positioning is completed by using the depth information of the image, and the specific process is as follows:

let P = [ u, v, d ] represent a certain point in the image, where u, v represent pixel coordinates of the image and d represents the corresponding depth value. From the camera (sensor) internal parameters and equation (9), the camera coordinates of the point can be calculated.

Wherein (x)_c，y_c) Being the principal point of the camera, f_xAnd f_yIs the focal length of the camera, (X)_P，Y_P，Z_P) Representing the camera coordinates of point P in the current pose. And then, converting the coordinates of the target into a model (real world) geographic coordinate system according to the geographic coordinates of the sensor to realize the positioning of the target so as to support subsequent higher-level tasks. For example, after the detected target is a dynamic target, the visualization requirement can be completed on the target, and the visualization requirement can be fulfilledThe coherent motion of the dynamic object is visualized.

Example 2:

the main difference between the present embodiment and embodiment 1 lies in the extraction of key points and the construction of descriptors.

In order to simplify the extraction process of the key points, when extracting the key points, this embodiment constructs only one square frame for the pixel points on the contour line, and when the included angle between the two intersection points of the square frame and the contour line and the central point (pixel point) is within the set angle threshold range, takes the point as a candidate angular point, and takes the candidate angular point as the key point. For example, a square frame with a side length of 11 pixels is only used for extracting key points, when the side length of the square frame is smaller, the closer the distance between two intersection points and a central point is, the higher the degree of association is, the better the characteristics of the central point can be characterized, the central point corresponding to the included angle within [60 degrees, 140 degrees ] is taken as a candidate angular point, and the candidate angular point is taken as a key point.

When the descriptor is constructed, because the key point is extracted by only one square frame, the included angle between the central point and two intersection points of the square frame and the contour line is used as the first descriptor of the key point; and using the label value of each pixel point on the square frame according to the set moving sequence as a second descriptor of the key point. Only one descriptor may be constructed, or two descriptors may be constructed simultaneously, in conformity with embodiment 1. Similarly, the matching method of the keypoints is also the same as that in embodiment 1, and the matching point pair may be determined by using the matching degree of only one descriptor, or may be determined by weighting the matching degrees of two descriptors.

Claims

1. A target positioning method based on panorama segmentation is characterized by comprising the following steps:

1) Acquiring a real scene image of a target area by using an image acquisition sensor, and acquiring an initial pose of the sensor; rendering the established geographic space three-dimensional model of the target area by using the initial pose of the sensor to obtain a rendered image of the target area under the initial pose;

2. The target positioning method based on panorama segmentation of claim 1, wherein the determining process of the key points is: traversing pixel points on a target contour line in an image segmentation result, establishing a square frame by taking the pixel points as a central point, intersecting the square frame and the target contour line at two intersection points, calculating included angle angles between the two intersection points and the central point, taking the pixel points corresponding to the included angle angles within a set angle threshold range as candidate angular points, and taking the candidate angular points as key points.

3. The target positioning method based on panorama segmentation according to claim 1, wherein the determining procedure of the key points is: traversing pixel points on a target contour line in an image segmentation result, establishing two square frames with different side lengths by taking the pixel points as a central point, respectively intersecting the two square frames with the target contour line at two intersection points, calculating included angle angles between the two intersection points and the central point in each square frame, and when the included angle angles under the two square frames are both within a set angle threshold range and the angle difference value of the two included angles is smaller than a set difference threshold, taking the corresponding pixel points as candidate angular points, and determining key points according to the candidate angular points.

4. The object localization method based on panorama segmentation according to claim 3, wherein when there are multiple candidate corner points in the set corner point region, the candidate corner point with the smallest angular difference between two included angles is selected as the key point in the set corner point region.

5. The target positioning method based on panorama segmentation according to claim 3, characterized in that descriptors are constructed for the key points, wherein the descriptors comprise a first descriptor and/or a second descriptor; wherein, the first descriptor is an angle average value of two included angles obtained by the key point; and the second descriptor establishes a label value of each pixel point on the square frame according to a set moving sequence in the square frame with smaller side length established by the key point.

6. The target positioning method based on panoramic segmentation of claim 5, wherein the matching degree of the keypoints of the two images is determined according to the quotient of the first descriptors of the keypoints in the two images, and the matching degree of the keypoints of the two images is higher when the quotient is closer to 1.

7. The target positioning method based on panorama segmentation according to claim 5, wherein the matching degree of the keypoints is determined according to the alignment degree of the second descriptors of the keypoints in the two images, wherein the alignment degree is a ratio of the number of labels at the same position in the second descriptors of the two keypoints to the total number of labels, and the closer the ratio is to 1, the higher the matching degree of the keypoints in the two images is.

8. The target positioning method based on panoramic segmentation of claim 5, wherein the determination formula of the matching degree of the two image key points is as follows:

match＝ω₁match₁+ω₂match₂

in the formula, match is the matching degree of two image key points, and match₁Adopting a first descriptor for two image key pointsDetermined degree of match, match₂Matching degree, omega, determined for two image key points using a second descriptor₁、ω₂Are match respectively₁、match₂Weight coefficient of (a), ω₁+ω₂＝1。

9. The method for locating the target based on the panorama segmentation according to claim 1, wherein the true pose of the sensor is calculated in the step 4) by adopting an epipolar geometric constraint method.

10. The target positioning method based on panoramic segmentation of claim 1, wherein a MaskFormer network is used in the step 2) to perform panoramic segmentation of the image.